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Abstract —Entropy and differential entropy are important 
quantities in information theory. A tractable extension to singular 
random variables—which are neither discrete nor continuous— 
has not been available so far. Here, we present such an extension 
for the practically relevant class of integer-dimensional singular 
random variables. The proposed entropy definition contains the 
entropy of discrete random variables and the differential entropy 
of continuous random variables as special cases. We show that 
it transforms in a natural manner under Lipschitz functions, 
and that it is invariant under unitary transformations. We define 
joint entropy and conditional entropy for integer-dimensional 
singular random variables, and we show that the proposed 
entropy conveys useful expressions of the mutual information. As 
first applications of our entropy definition, we present a result 
on the minimal expected codeword length of quantized integer¬ 
dimensional singular sources and a Shannon lower bound for 
integer-dimensional singular sources. 

Index Terms —Information entropy, rate distortion theory, 
Shannon lower bound, singular random variables, source coding. 


I. Introduction 
A. Background and Motivation 

Entropy is one of the fundamental concepts in information 
theory. The classical definition of entropy for discrete random 
variables and its interpretation as information content go back 
to Shannon [1] and were analyzed thoroughly from axiomatic 
[2] and operational [1] viewpoints. A similar definition for 
continuous random variables, differential entropy, was also 
introduced by Shannon [1], but its interpretation as information 
content is controversial [3]. Nonetheless, information-theoretic 
derivations involving undisputed quantities like Kullback- 
Leibler divergence or mutual information between continuous 
random variables can often be simplified using differential 
entropy. Furthermore, in rate-distortion theory, a lower bound 
on the rate-distortion function known as the Shannon lower 
bound can be calculated using differential entropy [4, Sec. 4.6]. 
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Finally, differential entropy arises in asymptotic expansions of 
the entropy of ever finer quantizations of a continuous random 
variable [3, Sec. IV]. Hence, although the interpretation of dif¬ 
ferential entropy is disputed, its operational relevance renders 
it a useful quantity. 

The concepts of entropy and differential entropy thus 
simplify the understanding and information-theoretic treat¬ 
ment of discrete and continuous random variables. However, 
these two kinds of random variables do not cover all in¬ 
teresting information-theoretic problems. In fact, a number 
of information-theoretic problems involving singular random 
variables, which are neither discrete nor continuous, have been 
described recently: 

• For the vector interference channel, a singular input 
distribution has to be used to fully utilize the available 
degrees of freedom [5], 

• In a probabilistic formulation of analog compression, the 
underlying source distribution is singular [6]. 

• In block-fading channel models, two different kinds of 
singular distributions arise: the optimal input distribution 
is singular in some settings [7, Ch. 6], and the noiseless 
output distribution is singular except for special cases [8]. 

Thus, a suitable generalization of (differential) entropy to sin¬ 
gular random variables has the potential to simplify theoretical 
work in these areas and to provide valuable insights. 

Another field where singular random variables appear is 
source coding. In many high-dimensional problems, determin¬ 
istic dependencies reduce the intrinsic dimension of a source. 
Thus, the random variable describing the source cannot be 
continuous but often is not discrete either. A basic example 
is a random variable x = (xi X2) T G R 2 supported on 
the unit circle, i.e., exhibiting the deterministic dependence 
x i + x 2 = 1- Although x is defined on R 2 and both components 
Xi,X 2 are continuous random variables, x itself is intrinsically 
only one-dimensional. The differential entropy of x is not 
defined and, in fact, classical information theory does not 
provide a rigorous definition of entropy for this random 
variable. Another, less trivial, example of a singular random 
variable is a rank-one random matrix of the form X = zz r , 
where z is a continuous random vector. 

The case of arbitrary probability distributions is very hard 
to handle, and due to its generality even the mere definition 
of a meaningful entropy seems impossible. Two existing 
approaches to defining (differential) entropy for more general 
distributions are based on quantizations of the random vari¬ 
able in question. Usually, the entropy of these discretizations 
converges to infinity and, thus, a normalization has to be 
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employed to obtain a useful result. In [9], this approach is 
adopted for very specific quantizations of a random variable. 
Unfortunately, this does not always result in a well-defined 
entropy and sometimes even fails for continuous random vari¬ 
ables of finite differential entropy [9, pp. 197f]. Moreover, the 
quantization process seems difficult to deal with analytically 
and no theory was built based on this definition of entropy. 1 A 
similar approach is to consider arbitrary quantizations that are 
constrained by some measure of fineness to enable a limit op¬ 
eration. In [3] and [10], e-entropy is introduced as the minimal 
entropy of all quantizations using sets of diameter less than 
e. However, to specify a diameter, a distortion function has 
to be defined. Since all basic information-theoretic quantities 
(e.g., mutual information or Kullback-Leibler divergence) do 
not depend on a specific distortion function, it is hardly pos¬ 
sible to embed e-entropy into a general information-theoretic 
framework. Furthermore, once again the quantization process 
seems difficult to handle analytically. 

Since the aforementioned approaches do not provide a sat¬ 
isfactory generalization of (differential) entropy, we follow a 
different approach, which is also motivated by ever finer quan¬ 
tizations of the random variable. However, in our approach, the 
order of the two steps “taking the limit of quantizations” and 
“calculating the entropy as the expectation of the logarithm of 
a probability (mass) function” is changed. More precisely, we 
first consider the probability mass functions of quantizations 
and take a normalized limit. (In the special case of a contin¬ 
uous random variable, this results in the probability density 
function due to Lebesgue’s differentiation theorem.) Then we 
take the expectation of the logarithm of the resulting density 
function. Due to fundamental results in geometric measure 
theory, this approach can result in a well-defined entropy 
only for integer-dimensional distributions, since otherwise the 
density function does not exist [11, Th. 3.1]. In fact, the 
existence of the density function implies that the random 
variable is distributed according to a rectifiable measure [11, 
Th. 1.1]. Thus, the distributions considered in the present paper 
are rectifiable distributions on Euclidean space. Although this 
is still far from the generality of arbitrary probability distribu¬ 
tions, it covers numerous interesting cases—including all the 
examples mentioned above—and gives valuable insights. 

The density function of rectifiable measures can also be 
defined as a certain Radon-Nikodym derivative. A generalized 
(differential) entropy based on a Radon-Nikodym derivative 
with respect to a “measure of the observer’s interest” was con¬ 
sidered in [12]. Our entropy is consistent with this approach, 
and at a certain point we will use a result on quantization 
problems established in [12]. However, because in our setting 
a concrete measure is considered, the results we obtain go be¬ 
yond the basic properties derived in [12] for general measures. 

B. Contributions 

We provide a generalization of the classical concepts of 
entropy and differential entropy to integer-dimensional random 

'This entropy should not be confused with the information dimension 
defined in the same paper [9], which is indeed a very useful and widely 
used tool. 


variables. Our entropy satisfies several well-known properties 
of differential entropy: it is invariant under unitary transforma¬ 
tions, transforms as expected under Lipschitz mappings, and 
can be extended to joint and conditional entropy. We show that 
the entropy of discrete random variables and the differential 
entropy of continuous random variables are special cases of 
our entropy definition. For joint entropy, we prove a chain 
rule which takes the geometry of the support set into account. 
Furthermore, we discuss why in certain cases our entropy 
definition may violate the classical result that conditioning 
does not increase (differential) entropy. We provide expres¬ 
sions of the mutual information between integer-dimensional 
random variables in terms of our entropy. We also show that an 
asymptotic equipartition property analogous to [13, Sec. 8.2] 
holds for our entropy, but with the Febesgue measure replaced 
by the Hausdorff measure of appropriate dimension. 

In our proofs, we exercise care to detail all assumptions and 
to obtain mathematically rigorous statements. Thus, although 
many of our results might seem obvious to the cursory 
reader because of their similarity to well-known results for 
(differential) entropy, we emphasize that they are not simply 
replicas or straightforward adaptations of known results. This 
becomes evident, e.g., for the chain rule (see Theorem 41 in 
Section VI-C), which might be expected to have the same form 
as the chain rule for differential entropy. However, already a 
simple example will show that the geometry of the support 
set may lead to an additional term, which is not present in the 
special case of continuous random variables. 

As a first application of the proposed entropy, we derive 
a result on the minimal expected binary codeword length of 
quantized integer-dimensional singular sources. More specif¬ 
ically, we show that our entropy characterizes the rate at 
which an arbitrarily fine quantization of an integer-dimensional 
singular source can be compressed. Another application is 
a lower bound on the rate-distortion function of an integer¬ 
dimensional singular source that resembles the Shannon lower 
bound for discrete [4, Sec. 4.3] and continuous [4, Sec. 4.6] 
random variables. For the specific case of a singular source that 
is uniformly distributed on the unit circle, we demonstrate that 
our bound is within 0.2 nat of the true rate-distortion function. 

C. Notation 

Sets are denoted by calligraphic letters (e.g., A). The 
complement of a set A is denoted A c . Sets of sets are 
denoted by fraktur letters (e.g., 911). The set of natural 
numbers {1,2,...} is denoted as N. The open ball with 
center x £ R M and radius r > 0 is denoted by B r (x), 
i.e., B r (x) — {y £ R M : ||y — x\\ < r}. The symbol 
uj(M) denotes the volume of the M -dimensional unit ball, i.e., 
w(M) = 7r M / 2 /r(l + M/2) where T is the Gamma function. 
Boldface uppercase and lowercase letters denote matrices and 
vectors, respectively. The m x m identity matrix is denoted 
by I m . Sans serif letters denote random quantities, e.g., x is 
a random vector and x is a random scalar. The superscript 
T stands for transposition. For x £ R, [^’J = max{m £ 
Z : m < x} and for x £ R M , L*J — (L^iJ L 2 : mJ) T - 
Similarly, \x\ = min {m gZ : m> x}. We write E x [-] for 
the expectation operator with respect to the random variable 
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x. Pr{x G A} denotes the probability that x G A. For 
x € R Ml and y G R M2 , we denote by p x : R m i + m 2 —> R Ml , 
p X (x,y) = x , the projection of R m i + m 2 to the first M\ 
components. Similarly, p y : R m i + m 2 —> R M 2 , p y (a:,y) = y, 
denotes the projection of R Mi +m 2 t0 j ast components. 
The generalized Jacobian determinant of a Lipschitz function 2 
<j) is written as For a function f with domain V and a 
subset V C V, we denote by <f >the restriction of <f> to 
the domain V. M" n denotes the m-dimensional Hausdorff 
measure. 3 ££ M denotes the M-dimensional Lebesgue measure 
and denotes the Borel cr-algebra on E A/ . For a measure 
/j and a y-measurable function /, the induced measure is 
given as pf~ 1 {A) = y{f~ 1 (A)). For two measures y and 
v on the same measurable space, we indicate by y <C v 
that y is absolutely continuous with respect to v (i.e., for 
any measurable set A, v(A) = 0 implies y(A) = 0). For 
a measure y and a measurable set £, the measure y|g is the 
restriction of y to £, i.e., y\g(A) = y{AC\£). The logarithm 
to the base e is denoted log and the logarithm to the base 2 
is denoted Id. In certain equations, we reference an equation 
number on top of the equality sign in order to indicate that 

(42) 

the equality holds due to some previous equation: e.g., = 
indicates that the equality holds due to eq. (42). 

D. Organization of the Paper 

The rest of this paper is organized as follows. In Sec¬ 
tion II, we review the established definitions of entropy 
and describe the intuitive idea behind our entropy defini¬ 
tion. Rectifiable sets, measures, and random variables are 
introduced in Section III as the basic setting for integer¬ 
dimensional distributions. In Section IV, we develop the 
theory of “lower-dimensional entropy”: we define entropy for 
integer-dimensional random variables, prove a transformation 
property and invariance under unitary transformations, demon¬ 
strate connections to classical entropy and differential entropy, 
and provide examples by calculating the entropy of random 
variables supported on the unit circle in R 2 and of positive 
semidefinite rank-one random matrices. In Sections V and 
VI, we introduce and discuss joint entropy and conditional 
entropy, respectively. Relations of our entropy to the mutual 
information between integer-dimensional random variables are 
demonstrated in Section VII. In Section VIII, we prove an 
asymptotic equipartition property for our entropy. In Sec¬ 
tion IX, we present a result on the minimal expected binary 
codeword length of quantized integer-dimensional sources. In 
Section X, we derive a Shannon lower bound for integer¬ 
dimensional singular sources and evaluate it for a source that 
is uniformly distributed on the unit circle. 

II. Previous Work and Motivation 

We first recall the definitions of entropy for discrete random 
variables [13, Ch. 2] and differential entropy for continuous 

-By Rademacher's theorem [14, Th. 2.14], a Lipschitz function is differen- 
tiable almost everywhere and, thus, the Jacobian determinant is well defined 
almost everywhere. 

3 Readers unfamiliar with this concept may think of it as a measure of an 
m-dimensional area in a higher-dimensional space (e.g., surfaces in M 3 ). An 
introduction and definition can be found in [14, Sec. 2.8]. 


random variables [13, Ch. 8]. Let x be a discrete random 
variable with probability mass function p x (xi) = Pr{x = Xi }, 
i G I, where X is the finite or countably infinite set indexing 
all possible realizations Xi of x. The entropy of x is 

H(x) = — E x [logp x (x)] = -y^Px(3h)logPx(3h) • (1) 

iex 

For a continuous random variable x on R M with probability 
density function / x , the differential entropy is 

h{x) = —E x [log / X (x)] = - [ f x (x) log f x (x) dJ? M (x). 

Jr m 

(2) 

We note that h(x) may be ±oo or undefined. 

A. Entropy of Dimension d(x) and e-Entropy 

There exist two previously proposed generalizations of 
(differential) entropy to a larger set of probability distributions. 
The first generalization is based on quantizations of the 
random variable to ever finer cubes [9], More specifically, 
for a (possibly singular) random variable x € R M , the Renyi 
information dimension of x is 

H(- L™h 

d(x) = lim , V n ' (3) 

n—foo log Tl 

and the entropy of dimension d[x) of x is defined as 

*5(x) w 4 d(x) log n ) (4) 

provided the limits in (3) and (4) exist. 

This definition of entropy of dimension d(x) corresponds to 
the following procedure: 

1) Quantize x using the cubes Jj[f=i [tS W3t h e 

i.e., consider the discrete random variable with probabil¬ 
ities pk = Pr {x £ nf=i [tt> ) }■ 

2) Calculate the entropy of the quantized random variable, 
i.e., the negative expectation of the logarithm of the 
probability mass function 

3) Subtract the correction term d(x) logn to account for the 
dimension of the random variable x. 

4) Take the limit n —> oo. 

Although this approach seems reasonable, there are several 
issues. First, the definition of /i^ x ^(x) seems to be difficult 
to handle analytically, and connections to major information- 
theoretic concepts such as mutual information are not avail¬ 
able. Furthermore, the quantization used is just one of many 
possible—we might, e.g., consider a shifted version of the set 
of cubes ni!i which, for singular distributions, 

may result in a different value of the resulting entropy. 

An approach that overcomes the latter issue is the concept of 
s-entropy [3], [10]. The definition of e-entropy does not use a 
specific quantization but takes the infimum of the entropy over 
all possible (countable) quantizations under a constraint on the 
diameter of the quantization sets. This is motivated by data 
compression: the quantization should be such that an error of 
maximally e is made (thus, the quantization sets have maximal 
diameter e) and at the same time the minimal possible number 
of bits should be used to encode the data (thus, the entropy is 
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minimized over all possible quantizations). More specifically, 
for a random variable x £ ft ' 7 , let 'jft denote the set of all 
countable partitions of R A/ into mutually disjoint, measurable 
sets of diameter at most e. Furthermore, for a partition 0 = 
{Ai : * £ N} £ the quantization [x]q £ N is the discrete 
random variable defined by pi = Pr{ [x]q = <} = Pr{x £ A ,} 
for i £ N. Then the e-entropy of x is defined as 


H e (x) = inf i?([x] Q ). (5) 


Here, a problem is that H e (x) is only defined for a fixed 
e > 0 and the limit e —> 0 converges to oo for nondiscrete 
distributions. However, as in the case of Renyi information di¬ 
mension, a correction term can be obtained using the following 
seemingly new definition of information dimension: 


d*(x) = lim 

e-s-0 


He(*) 

log 7 


By [15, Prop. 3.3], the definitions of information dimension 
using Renyi’s approach and the e-entropy approach coincide, 
i.e., d* (x) = d(x). This suggests the following new definition 
of a cifxj-dimensional entropy. 

Definition 1: Let x £ R M be a random variable with 
existing information dimension d(x). Then the asymptotic e- 
entropy of dimension d(x) is defined as 


h *d(x)(x) ~ I™ (H e (x) + d(x) loge) . 

This definition corresponds to the following procedure: 

1) Quantize x using an entropy-minimizing quantization 4 0 
given a diameter constraint e, i.e., consider the discrete 
random variable [x]q with probabilities pi = Pr{[x]ej = 

i) = Pr{x £ Ai} for A, £ 0, where the diameter of 
each Ai is upper bounded by e. 

2) Calculate the entropy of the quantized random variable 
[x]q, i.e., the negative expectation of the logarithm of the 
probability mass function pi. 

3) Add the correction term cifxjloge to account for the 
dimension of the random variable x. 

4) Take the limit e —> 0. 

Although this entropy is more general than the entropy of 
dimension d(x) in (4), the fundamental problems persist: we 
are still restricted to the choice of sets of small diameter 
(this is of course useful if we consider maximal distance 
as a measure of distortion but can yield unnecessarily many 
quantization points for areas of almost zero probability), and 
the definition still seems to be difficult to handle analytically 
and lacks connections to established information-theoretic 
quantities such as mutual information. 


B. An Alternative Approach 

Here, we propose a different approach, which is motivated 
by the definition of differential entropy. The basic idea is 
to circumvent the quantization step and perform the entropy 
calculation at the end. Assuming x £ R M , this results in the 
following procedure: 

4 We assume for simplicity that an entropy-minimizing quantization exists 
although in general the infimum in (5) may not be attained. 


1) For some x £ ft M , divide the probability Pr{x £ 
B e (x)j by the correction factor 5 cc(d(x)) e d ^ x \ (Recall 
that u>(d(x)) denotes the volume of the cifxj-dimensional 
unit ball.) 

2) Take the limit e —> 0. 

3) Calculate the entropy as the negative expectation of the 
logarithm of the resulting density function. 

More specifically, steps 1-2 yield the density function 6 


9 x (x) = lim 


Pr{x € B e (x)} 
wfcffxj) e d( A 


(6) 


and the entropy in step 3 is thus given by 


h d(x) (x) 4-E x [log0 x (x)]. (7) 


We will show that this definition of entropy will lead to 
definitions of joint and conditional entropy, various useful 
relations, connections to mutual information, an asymptotic 
equipartition property, and bounds relevant to source coding. 
However, our definition does have one limitation: as pointed 
out in [6, Sec. VII-A], the existence of the limit in (6) for 
almost every x £ R M is a much stronger assumption than 
the existence of the Renyi information dimension (3). Loosely 
speaking, the existence of the limit in (6) requires that the 
random variable x is cifxj-dimensional almost everywhere 
whereas the existence of the Renyi information dimension 
merely requires that the random variable is cifxj-dimensional 
“on average.” By Preiss’ Theorem [16, Th. 5.6], convergence 
in (6) even implies that the probability measure induced 
by the random variable x is rectifiable (see Definition 6 
in Section III-B), which means that our definition does not 
apply to, e.g., self-similar fractal distributions. However, we 
are not aware of any application or calculation of the cifxj- 
dimensional entropy in (4) (or the asymptotic version of e- 
entropy) for fractal distributions, and it does not seem clear 
whether the cifxj-dimensional entropy is well defined in that 
case (although the information dimension (3) exists). 

The rectifiability also implies that the density function 6 X fx) 
is equal to a certain Radon-Nikodym derivative. Based on this 
equality, the entropy f) d ( x )(xj defined in (7) and (6) can be 
interpreted as a generalized entropy as defined in [12, eq. (1.5)] 
by 



Here, A is a cr-finite measure on R M and p is a probability 
measure on R M . While p can be chosen as the measure of a 
given random variable, the generalized entropy (8) provides no 
intuition on how to choose the measure A. It is more similar to 
a divergence between measures and, in particular, reduces to 
the Kullback-Leibler divergence [17] for a probability measure 
A. We will see (cf. Remark 19) that our entropy definition 
coincides with (8) for the choice A = Af m \s~ where m and 


5 The constant factor aj(d(x)) is included to obtain equality with differential 
entropy in the special case d(x) = M. A different factor would result in an 
additive constant in the entropy definition. 

6 A mathematically rigorous definition will be provided in Section m-B. 
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£ depend on the given random variable. This interpretation 
will allow us to use basic results from [12] for our entropy 
definition. 

Motivated by the entropy expression in (7), a formal defini¬ 
tion of the entropy of an integer-dimensional random variable 
will be given in Section IV-A, based on the mathematical 
theory of rectifiable measures discussed next. 

III. Rectifiable Random Variables 

As mentioned in Section II-B, the existence of a c/(x)- 
dimensional density implies that the random variable x is 
rectifiable. In this section, we recall the definitions of rec¬ 
tifiable sets and measures and introduce rectifiable random 
variables as a straightforward extension. Furthermore, we 
present some basic properties that will be used in subsequent 
sections. For the convenience of readers who prefer to skip the 
mathematical details, we summarize the most important facts 
in Corollary 12. 

A. Rectifiable Sets 

Our basic geometric objects of interest are rectifiable sets 
[18, Sec. 3.2.14], As the definition of rectifiable sets is not 
consistent in the literature, we provide the definition most 
convenient for our purpose. We recall that M 1 "' denotes the 
m,-dimensional Hausdorff measure. 

Definition 2 ([14, Def 2.57]): For to £ N, an Jff 771 - mea¬ 
surable set £ C R m (with M > to,) is called m-rectifiable 1 
if there exist _Sf m -measurable, bounded sets Ak C R m and 
Lipschitz functions fk : Ak —> R M , both for 8 k £ N, such 
that M 7 ™ (£ \ UfcgN fk{-Ak)) = 0. A set £ C R M is called 
0-rectifiable if it is finite or countably infinite. 

Remark 3: Hereafter, we will often consider the setting of 
?n-rectifiable sets in R M and tacitly assume m £ {0,..., M}. 

Rectifiable sets satisfy the following well-known basic prop¬ 
erties. 

Lemma 4: Let £ be an m-rectifiable subset of M m . 

1) Any ^"‘-measurable subset V C £ is also m-rectifiable. 

2) The measure is c-finite. 

3) Let (f>: M m —> R w with N > to, be a Lipschitz function. 
If <t>(£) is ^‘'"‘-measurable, then it is m-rectifiable. 

4) For n > m, we have Jif n (£) = 0. 

5) Let £i for i £ N be m-rectifiable sets. Then |J ieN £j * s 
m-rectifiable. 

6) For m fi 0, R m is m-rectifiable. 

Intuitively, rectifiable sets are lower-dimensional subsets of 
Euclidean space. Examples include affine subspaces, algebraic 
varieties, differentiable manifolds, and graphs of Lipschitz 
functions. As countable unions of rectifiable sets are again 
rectifiable, further examples are countable unions of any of 
the aforementioned sets. 

Remark 5: There are various characterizations of m-rec¬ 
tifiable sets that provide connections to other mathematical 
disciplines. For example, an ^‘'"‘-measurable set £ C R A/ 

7 In [14, Def. 2.57], these sets are called countably -rectifiable. 

8 This definition also encompasses finite index sets k C {1..... A }; it 
suffices to set Ak = 0 for k > K. 


is m-rectifiable if and only if there exist 71- C R M such 
that £ C 7o U UfceN 7fc> where J4f m (To) = 0 and each 
7fc is an to- dimensional, embedded C 1 submanifold of R M 
[19, Lem. 5.4.2], Another characterization, based on [18, 
Cor. 3.2.4], is that £ C R M is m-rectifiable if and only if 

£ C £ 0 U |J f k (Ak) (9) 

fee N 

where J(f m (£ o) = 0, Ak are bounded Borel sets, and 
fk : R m —> R m are Lipschitz functions that are one-to-one 
on Ak- Due to [20, Th. 15.1], this implies that fkfAk) are 
also Borel sets. 

B. Rectifiable Measures 

Loosely speaking, rectifiable measures are measures that 
are concentrated on a rectifiable set. The most convenient 
way to define “concentrated on" mathematically is in terms 
of absolute continuity with respect to a specific Hausdorff 
measure. 

Definition 6 ([14, Def. 2.59]): A Borel measure /i on R M is 
called m-rectifiable if there exists an m-rectifiable set £ C M m 
such that /i < g. 

For an m-rectifiable measure p, i.e., p <C Jff m \£ for an m- 
rectifiable set £ C R M , we have by Property 2 in Lemma 4 
that M ,rn \s is cr-finite. Thus, by the Radon-Nikodym theorem 
[14, Th. 1.28], there exists the Radon-Nikodym derivative 

C(*) = 7 i (a) (10) 

satisfying d/.i = Off We will refer to 0ff(x) as the 

m-dimensional Hausdorff density of fi. 

Remark 7: If /r is an m-rectifiable probability measure, it 
cannot be n-rectifiable for n fi m. Indeed, suppose that /i is 
both m-rectifiable and n-rectifiable where, without loss of gen¬ 
erality, n > m. Then there exists an m-rectifiable set £ such 
that p. <C 4%' vn \e. which implies fi(£ c ) = 0. There also exists 
an n-rectifiable set F such that /i <C By Property 4 in 

Lemma 4, the m-rectifiable set £ satisfies Jif n (£) = 0 and, in 
particular, Jff n \j^(£) = 0. Because /j. <C M >n \ _f, this implies 
li(£) = 0. Hence, = /i(£ c ) + /i(£) = 0, which is a 

contradiction to the assumption that /r is a probability measure. 

To avoid the nuisance of separately considering the case 
d Jpm| g = 0 in many proofs and to reduce the class of m- 
rectifiable sets of interest, we define the following notion of a 
support of an m-rectifiable measure. 

Definition 8: For an m-rectifiable measure n on ]R M , an 
m-rectifiable set £ C R M is called a support of p if 
p < A J^L\ e > 0 1£-almost everywhere, and 

£ = Ufe^N fk(Ak) where, for k £ N, Ak is a bounded Borel 
set and fk : R m —> R M is a Lipschitz function that is one-to- 
one on Ak- 

Lemma 9: Let p be an m-rectifiable measure, i.e., p <C 
drfe m \£ for an m-rectifiable set £ C R M . Then there exists a 
support £ C £. Furthermore, the support is unique up to sets 
of ^’"‘-measure zero. 

Proof: See Appendix A. ■ 
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Remark 10: For m-rectifiable measures, it is possible to 
interpret the Hausdorff density O'f'fx) as a measure of “local 
probability per area.” Indeed, for an m-rectifiable measure /j, 
i.e., /.t <C Aif rn \e for an m-rectifiable set £, we can write 
9™(x) in (10) as 


C(®) = lim 


r— 


p(B r (x)) 

ui(m)r m 


( 11 ) 


Jf’ m |£-almost everywhere (for a proof see [14, Th. 2.83 and 
eq. (2.42)]). Furthermore, the right-hand side in (11) vanishes 
for .^""'-almost all points not in £. Note the similarity of (11) 
with the ad-hoc construction in Section II-B. Indeed, (11) is the 
mathematically rigorous formulation of (6). This formulation 
also provides details regarding the probability measures for 
which it results in a well-defined quantity. 


C. Rectifiable Random Variables 

As we are only interested in probability measures and be¬ 
cause information theory is often formulated for random vari¬ 
ables, we define m-rectifiable random variables. In what fol¬ 
lows, we consider a random variable x: (Cl, ©) — > (R M , 23m) 
on a probability space (Cl, ©,/z), i.e., Cl is a set, 6 is a 
a- algebra on Cl, and p is a probability measure on (Cl, 6). 
The probability measure induced by the random variable x 
is denoted by px~ x . For A £ 23m, px^ 1 (A) equals the 
probability that x G A, i.e., 

px^ 1 (A) = p(x^ x (A)) = Pr{x G .4} . (12) 

Definition 11: A random variable x: Cl — > R M on a prob¬ 
ability space (Cl,&,p) is called m-rectifiable if the induced 
probability measure /rx' 1 on R A/ is ?n-rectifiable, i.e., there 
exists an m-rectifiable set £ C R M such that px^ 1 <C M‘ m \ £ . 
The m-dimensional Hausdorff density of an ?n-rectifiable 
random variable x is defined as (cf. (10)) 

= = IB) 

Furthermore, a support of the measure px~ x is called a support 
ofx, i.e., £ is a support of x if px~ 1 <C s, | £ (x) > 
0 ^ m |£-almost everywhere, and £ = [J fcgN fk(Ak) where, 
for k G N, Ak is a bounded Borel set and fk : R m —> R M is 
a Lipschitz function that is one-to-one on Ak- 

Note that due to Remark 7, an m-rectifiable random variable 
cannot be ?r-rectifiable for n m. 

In the nontrivial case m < M, the m-dimensional Hausdorff 
density 0f(x) is not a probability density function in the 
classical sense and is nonzero only on an ?n-dimensional set £. 
Indeed, the random variable x will vanish everywhere except 
on a set of Lebesgue measure zero, and thus a probability 
density function cannot exist. However, the m-dimensional 
Hausdorff measure of the support set does not vanish, and 
one can think of Of 1 as an m-dimensional probability density 
function of the random variable x on R M . 

Based on our discussion of rectifiable measures in Sec¬ 
tion III-B, we can find a characterization of ?n-rectifiable 
random variables that resembles well-known properties of 
continuous random variables. This characterization is stated 


in the next corollary. Note, however, that although everything 
seems to be similar to the continuous case, Hausdorff measures 
lack substantial properties of the Lebesgue measure, e.g., the 
product measure is not always again a Hausdorff measure. 

Corollary 12: Let x be an ?n-rectifiable random variable on 
R m , i.e., /ix -1 <C M >m \s for an m-rectifiable set £ C R M . 
Then there exists the ?n-dimensional Hausdorff density Of, 
and the following properties hold: 

1) The probability Pr{x G .4} for a measurable set A C R M 
can be calculated as the integral of 9™ over A with re¬ 
spect to the m-dimensional Hausdorff measure restricted 
to £, i.e., 

Pr{x G A} = px~ 1 (A) = [ 9f l (x) dJ / if m \s(x). (14) 

JA 

2) The expectation of a measurable function /: R M —>- R 
with respect to the random variable x can be expressed 
as 

E x [f(x)]= f f(x)0?(x) dJif m \ £ (x). (15) 

Jr m 

3) The random variable x is in £ with probability one, i.e., 

Pr{x G £} = px^(£) = J 6f l (x) dJif m \ £ (x) = 1 . 

(16) 

4) There exists a support £ C £ of x. 

The special cases m = 0 and m = M reduce to well-known 
concepts. 

Theorem 13: Let x be a random variable on R M . Then: 

1) x is O-rectifiable if and only if it is a discrete random 
variable, i.e., there exists a probability mass function 
Px(xi) = Pr{x = xt} > 0, i G I, where I is 
a finite or countably infinite index set indicating all 
possible realizations Xi of x. In this case, (L = p x and 
£ = {xi -.id} is a support of x. 

2) x is M -rectifiable if and only if it is a continuous random 
variable, i.e., there exists a probability density function 
/ x such that Pr{x G .4} = J A f x (x) dJzf M (x). In this 
case, 8 = f x AC A/ -almost everywhere. 

Proof: See Appendix B. ■ 

The following theorem introduces a nontrivial class of m- 
rectifiable random variables. 

Theorem 14: Let x be a continuous random variable on 
Rm. Furthermore, let f>: R m —> R M with M > m be 
a locally Lipschitz mapping whose m-dimensional Jacobian 
determinant 9 satisfies J^x) > 0 ,if m -almost everywhere, and 
assume that <^(R m ) is ^'"‘-measurable. Then y = f>(x) is an 
m-rectifiable random variable on R M . 

Proof: According to Definition 11, we have to show 
that /uy -1 < j4C m \£ for an m-rectifiable set £ C R M . By 
Properties 1, 3, and 6 in Lemma 4, the set £ = 4>(B r ( 0)) 
is m-rectifiable (<f> is Lipschitz on B r ( 0) for all r > 0). 

9 The m-dimensional Jacobian determinant of (j) is defined as J^(a?) = 
(Dj(xj D$ ( x )), where D^x) G R Mxm denotes the Jacobian 
matrix of <p, which is guaranteed to exist almost everywhere. Note in particular 
that J(f)(x) is nonnegative. 
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Hence, by Property 5 in Lemma 4, the set £ = = 

U r6N ^Br(0)) is m-rectifiable. Thus, it suffices to show that 
/.ty ' 1 <C ^ m | 0 (R»n), i.e., that for any J^ m -measurable set 
A C R M , J4? m \ ( f ) (Rr n )(A) = 0 implies p,y~ 1 (A) = 0. To 
this end, assume first that (-4) = 0 for a bounded 

.^""-measurable set A C R M . Let f denote the probability 
density function of x. By the generalized change of variables 
formula [14, eq. (2.47)], we have 


It-HA) 


f{x)J 4> (x) dJf m (x) 

[ E f(x)dJ? m (y) 

^ 1 (- 4 )) xe4>- 1 {A)r\4>-'>-({y}) 

/ E 

xerp- 1 {A)r\4i- 1 ({y}) 


( = } 0 


(17) 


where (a) holds because Jt? m (A IT 0(R m )) = 0. Be¬ 
cause J t j > (x) > 0 „S? r "-almost everywhere, (17) implies 
f(x) = 0 Jz? m -almost everywhere on cf)~ 1 {A), and hence 
f<f>-i(A) /(*) d Jz? m (x) = 0. Thus, we have 

p,y~ 1 (A) = px~ 1 (fi~ 1 (A)) = I f(x)dJf m (x) = 0. 

U-HA) 

For an unbounded J^ m -measurable set A C R M satisfying 
^ >m \<t>(WL m ) (-4) = 0 , following the arguments above, we obtain 
^y~ 1 (^lnH r ( 0 )) = 0 for the bounded sets AnB r (0), r £ N. 
This implies py^ x {A) < D reN fty _1 (-4 H H r (0)) =0. ■ 


D. Example: Distributions on the Unit Circle 

As a basic example of 1-rectifiable singular random vari¬ 
ables, we consider distributions on the unit circle in R 2 , i.e., 
on <Si = {at £ R 2 : ||at|| = 1}. 

Corollary 15: Let z be a continuous random variable on R. 
Then x = (xi X 2 ) 1 " = (cosz sinz) T is a 1-rectifiable random 
variable. 

Proof: The mapping (f>: z i-a (cos z sin 2 ) T is Lipschitz 
and its Jacobian determinant is identically one. Thus, we can 
directly apply Theorem 14. ■ 

This toy example is intuitive and illustrates the concept 
of m-rectifiable singular random variables in a very simple 
setup. In a similar way, one can analyze the rectifiability of 
distributions on various other geometric structures. 


E. Example: Positive Semidefinite Rank-One Random Matri¬ 
ces 

A less obvious example of an //(-rectifiable singular random 
variable are positive semidefinite rank-one random matrices, 
i.e., matrices of the form X = zz 1 £ R r " xr ", where z is a 
continuous random variable on R m . 

Corollary 16: Let z be a continuous random variable on 
R m . Then the random matrix X = zz 1 is m-rectifiable on 

1"‘ 2 . 

Proof: The mapping (f>: z i—»• zz v is locally Lipschitz. 
Thus, in order to apply Theorem 14, it remains to show that 


J^z) > 0 Jz? m -almost everywhere. To calculate the Jacobian 
matrix D${z), we stack the columns of the matrix zz T and 
differentiate the resulting vector with respect to each element 
Zi. It is easily seen that the resulting Jacobian matrix is given 
by 

/ ze[ + zil m \ 

T x 

ze o 


D*(z) = 


= T + z 2 I„ 


\ ze l 


(18) 


i\mj 


where e, denotes the /th unit vector. As long as at least one 
element Zi is nonzero, D c / > (z) has full rank. Thus, J<p(z) > 0 
_S? m -almost everywhere. ■ 

Remark 17: For the case of positive definite random ma¬ 
trices, i.e., X m = Z ‘ Z I with independent continuous 

Zj, it is easy to see that the measures induced by these 
random matrices are absolutely continuous with respect to 
the m(m + l)/2-dimensional Lebesgue measure on the space 
of all symmetric matrices. The intermediate case of positive 
semidefinite rank-deficient random matrices X n = z ' z ] 

for n £ {2, ..., m — 1}, where the z, £ R m , i £ {1, ..., n} 
are independent continuous random variables, is consider¬ 
ably more involved because the mapping (zi,...,z n ) 
’Y™ =1 ZizJ has a vanishing Jacobian determinant almost ev¬ 
erywhere. We conjecture that X n is (mn — n(n — l)/2)- 
rectifiable, conforming to the dimension of the manifold 
of all positive semidefinite rank-n matrices with n distinct 
eigenvalues. 


IV. Entropy of Rectifiable Random Variables 


A. Definition 

The m-rectifiable random variables introduced in Defini¬ 
tion 11 will be the objects considered in our entropy definition. 
Due to the existence of the m-dimensional Hausdorff density 
0 ™ for these random variables (see ( 11 ) and (13)), the heuristic 
approach described in Section II-B (see ( 6 ) and (7)) can be 
made rigorous. 

Definition 18: Let x be an m-rectifiable random variable on 
R m . The m-dimensional entropy of x is defined as 


fi m (x) = -E x [log0™(x)] =- [ log 0™{x) d/ix 1 {x) 

J R M 

(19) 

provided the integral on the right-hand side exists in R U 
{±oo}. 

By (15), we obtain 


(l m (x) = - f 9™(x)\og9™(x)dJf m \ £ {x) (20) 

J R M 


= - [ 9™(x)\og9™(x)dJf m (x) 


( 21 ) 


J£ 

where £ C R M is an arbitrary m-rectifiable set satisfying 
//x <C M nn | £ (in particular, £ may be a support of x). 

Remark 19: For a fixed m-rectifiable measure //, our entropy 
definition (19) can be interpreted as a generalized entropy ( 8 ) 
with A = This will allow us to use basic results 

from [12] for our entropy definition. However, our definition 
changes the measure A based on the choice of p and thus is 
not simply a special case of ( 8 ). 
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B. Transformation Property 

One important property of differential entropy is its invari¬ 
ance under unitary transformations. A similar result holds for 
771-dimensional entropy. We can even give a more general 
result for arbitrary one-to-one Lipschitz mappings. 

Theorem 20: Let x be an m-rectifiable random variable 
on R^ with 1 < to < N, finite ///-dimensional entropy 
b m (x), support £, and m-dimensional Hausdorff density Of . 
Furthermore, let <fi: —> R M with M > m be a Lipschitz 

mapping such that 10 > 0 Jff m \g-almost everywhere, <f>{£) 
is J^ m -measurable, and E x [log J^(x)] exists and is finite. If 
the restriction of (f> to £ is one-to-one, then y = <j>(x) is an 
? 7 i-rectifiable random variable with m-dimensional Hausdorff 
density 

. K(r l (v)) 

{v] - 4W-M) 

jT m 1 0 ( £ )-almost everywhere, and its m-dimensional entropy 
is 

h m (y) = r(x)+E x [logj|(x)]. 

Proof: See Appendix C. ■ 

Remark 21: Theorem 20 shows that for the special case of 
a unitary transformation rj> (e.g., a translation), 

TOW) = TO) 

because ,/| (x) is identically one in that case. 

Remark 22: In general, no result resembling Theorem 20 
holds for Lipschitz functions cf>: R w —> R M that are not one- 
to-one on £. We can argue as in the proof of Theorem 20 
and obtain that y = < fi(x ) is ?n-rectifiable and that the m- 
dimensional Hausdorff density is 

£ TO) 


C (y) = 


'y Xi ” ~ j£( x \ 

a/e0 _1 ({y}) ^ 

1 0 (£)-almost everywhere. We then obtain for the tri¬ 
dimensional entropy 


fi m (y) = - 


4(f) 



d TOy) 


= -^TO) 


x log 


TO') 


^x'e4>~ x ( {</>(<»)}) $ 


dJf m {x) 


where (a) holds because of the generalized area formula [14, 
Th. 2.91], In most cases, this cannot be easily expressed in 
terms of a differential entropy due to the sum in the logarithm. 
However, in the special case of a Jacobian determinant and 
a Hausdorff density Of that are symmetric in the sense that 
Qf(x') and J^(x') are constant on (p -1 ({(j>(x)}) for all x £ 

10 Here ,L denotes the Jacobian determinant of the tangential differential 
of (p in E. For details see [18, Sec. 3.2.16]. 


£, the summation reduces to a multiplication by the cardinality 

of 0 _1 ({^(®)})- 

C. Relation to Entropy and Differential Entropy 

In the special cases m = 0 and m = M, our entropy 
definition reduces to classical entropy ( 1 ) and differential 
entropy ( 2 ), respectively. 

Theorem 23: Let x be a random variable on R M . If x is 
a O-rectifiable (i.e., discrete) random variable, then the 0- 
dimensional entropy of x coincides with the classical entropy, 
i.e., h°(x) = H(x). If x is an M -rectifiable (i.e., continuous) 
random variable, then the M-dimensional entropy of x coin¬ 
cides with the differential entropy, i.e., f) M (x) = h(x). 

Proof: Let x be a O-rectifiable random variable. By 
Theorem 13, x is a discrete random variable with possible 
realizations Xi, i £ I, the 0-dimensional Hausdorff density 0° 
is the probability mass function of x, and a support is given 
by £ = {xi : i £ I}. Thus, (21) yields 

f)°W = —J 0^(x)\og0^(x)dJf’°(x) 

= - ^2 Pr{x = Xi} log Pr{x = x t } 

tei 

= TO> 

where (a) holds because Jrff 0 is the counting measure. 

Let x be an M-rectifiable random variable. By Theorem 13, 
x is a continuous random variable and the M-dimensional 
Hausdorff density Of 1 is equal to the probability density 
function / x . Thus, (19) yields 

h M (x) = -E x [log^ / (x)] = —E x [log/ x (x)] = h(x ). 


To get an idea of the m-dimensional entropy of random 
variables in between the discrete and continuous cases, we can 
use Theorem 14 to construct m-rectifiable random variables. 
More specifically, we consider a continuous random variable 
x on R m and a one-to-one Lipschitz mapping rj >: M m —► R M 
(M > m) whose generalized Jacobian determinant satisfies 
> 0 jS? m -almost everywhere. Intuitively, we should see a 
connection between the differential entropy of x and the tri¬ 
dimensional entropy of y = 0 (x). By Theorem 14, the random 
variable y is m-rectifiable and, because <f> is one-to-one, we 
can indeed calculate the m-dimensional entropy. 

Corollary 24: Let x be a continuous random variable on R m 
with finite differential entropy h(x) and probability density 
function / x . Furthermore, let f: R m — > R M (M > m) be a 
one-to-one Lipschitz mapping such that > 0 2zf m -almost 
everywhere and E x [log Wfx)] exists and is finite. Then the 
m-dimensional Hausdorff density of the m-rectifiable random 
variable y = <j>(x) is 


TO) 


fM \y)) 
^(WHy)) 


df?™ 10 (Rm)-almost everywhere, and the m-dimensional en¬ 
tropy of y is 


b m (y) = h(x) + E x [log W(x)] . 
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For the 

special case 

of the embedding </>: R m — > 

M m , 

</>(%!,■■■ 

5 *£m) — (*£ 1 

■ ■ x m 0 • • • 0) T , this results in 




• • , Xm, 0,..., 0) = h(x ). 

(22) 


Proof: The first part is the special case N = m and 
£ = R m of Theorem 20. The result (22) then follows from the 
fact that, for the considered embedding, J 0 (x) is identically 
1. ■ 


D. Example: Entropy of Distributions on the Unit Circle 

It is now easy to calculate the entropy of the 1-rectifiable 
singular random variables on the unit circle previously con¬ 
sidered in Section III-D. Let z be a continuous random 
variable on R. with probability density function f z supported 
on [0, 27 t), i.e., f z {z) = 0 for z [0,27r). By Corollary 24, 
the 1-dimensional Hausdorff density of the random variable 
x = <j>{ z) = (cosz sinz) T is given by (recall that the Jacobian 
determinant is identically one) 

d l{ x ) = fM -1 (x)) (23) 

-almost everywhere, and the entropy of x is given by 

f) 1 (x) = Mz). (24) 

Of course, this result for h 1 (x) may have been conjectured by 
heuristic reasoning. Next, we consider a case where heuristic 
reasoning does not help. 


E. Example: Entropy of Positive Semidefinite Rank-One Ran¬ 
dom Matrices 


As a more challenging example, we calculate the entropy 
of a specific type of m-rectifiable singular random variables, 
namely, the positive semidefinite rank-one random matrices 
previously considered in Section III-E. 

Theorem 25: Let z be a continuous random variable on 
W" with probability density function / z , and let z denote the 
random variable with probability density function ) z (z ) = 
( f z (z ) + f z (—z))/ 2. Then the ?n-dimensional entropy of the 
random matrix X = zz 1 is given by 

b m (X) = h( z) + log 2 + y E z [log||z|| 2 ]. (25) 


Proof: We first calculate the Jacobian determinant of 
the mapping <fi: z K > zz T , which is given by J^(z) = 
^/det (Dj(z)D,j > (z)). By (18) and some simple algebraic ma¬ 
nipulations, one obtains J$(z) = \J det(2||z|| 2 I m + 2 zz T ), 
and further 



(26) 


where (a) holds due to [21, Example 1.3.24], Because the 
mapping <f>: z H>• zz T is not one-to-one, we cannot directly 


use Corollary 24. However, along the lines of Remark 22, we 
obtain 


() m (X) 


fz (z) log 


/z(z') 


J ^ Z 



Because the z' £ (f> 1 {{4>(z)}) are given by ±z, and because 
fz{z) + fz(-z) = 2/ z (z) and J 0 (z) = J<j,(-z) (see (26)), 
eq. (27) implies 

f) m (X) 

=-i«*> k 4fs) d ^ (2) 

= - f fz(z)( log 2 + log/ z (z) — log ^(z)) d2z? m (z) 
jR m 

= -log2- f / z (z)log/,(z)d^ m (z)+E z [logJ 0 (z)] 

= - log2 - \ f f z (z) log f- z (z) dJf m (z) 
z J R m 

fz(-z) log h(z) dJf m (z) + E z [log J4 z)] 

Z JR m 

= — log2 — / fz{z) log ,fz(z) dJjf m (z) + E z [log J^(z)] 

= - log2 + h{z) +E z [log J^(z)] (28) 

where (a) holds because / z (— z) = f z (z). Inserting (26) into 
(28) gives (25). ■ 

A practically interesting special case of symmetric random 
matrices is constituted by the class of Wishart matrices [22], 
A rank-n Wishart matrix is given by \N n s = S"=i z t z l £ 
where the z i £ {l,...,n} are independent and 
identically distributed (i.i.d.) Gaussian random variables on 
R"' with mean 0 and some nonsingular covariance matrix E. 
The differential entropy of a full-rank Wishart matrix (i.e., 
n > m), considered as a random variable in the m(m+ 1 )/2- 
dimensional space of symmetric matrices, is given by [23, 
eq. (B.82)] 

h( W n>s ) = log ^ 2 mn / 2 Tm (det E) ra ( 2 ^ 

+ ^ ^ + 1 E z [log det (W ra; s)] (29) 

where T m (-) denotes the multivariate gamma function. In our 
setting, full-rank Wishart matrices can be interpreted as m(m+ 
l)/2-rectifiable random variables in the ?7i 2 -dimensional space 
of all m x to matrices by considering the embedding of 
symmetric matrices into the space of all matrices and using 
Theorem 14. Using this interpretation, we can use Corollary 24 
and obtain h(\N n ^) = ft m ( m+1 l/ 2 (W„.s). 

The case of rank-deficient Wishart matrices, i.e., £ 

{1,..., to — 1}, has not been analyzed information-theoret- 
ically so far. For simplicity, we will consider the case of rank- 
one Wishart matrices, i.e., = zz 1 £ R mxm . The m- 

dimensional entropy of Wu is given by (25) in Theorem 25. 
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Because z is Gaussian with mean 0, we have z = z in 
Theorem 25, so that (25) simplifies to 

tl m (Wi, s ) = h{ z) + ^-!-log2+ ^E z [log||z|| 2 ]. 

Again using the Gaussianity of z, we obtain further 

t) m (W 1>s ) = log((2 7 rer/ 2 (detS) 1 / 2 ) 

+ log 2 + E z [log||z|| 2 ] 

= log(2 m “ 1/2 7r m/2 (det£) 1/2 ) 

+ | + |E z [log||z|| 2 ]. (30) 

If z contains independent standard normal entries, then ||z|| 2 is 
Xm distributed and E z [log||z|| 2 ] = tp(m/ 2)+log2, where i />(•) 
denotes the digamma function [23, eq. (B.81)]. It is interesting 
to compare (30) with the differential entropy of the full-rank 
Wishart matrix as given by (29). Although there is a formal 
similarity, we emphasize that the differential entropy in (29) 
cannot be trivially extended to the setting n < m because 
neither r m (]|) nor logdet(W n .s) is defined in this case. We 
conjecture that an expression similar to (30) can be derived for 
other rank-deficient Wishart matrices. However, as mentioned 
in Section III-E, the analysis of these matrices is significantly 
more involved and, thus, beyond the scope of this paper. 

Remark 26: A different approach to defining an entropy for 
rank-deficient Wishart matrices would be to use a coordinate 
system on the manifold of all positive semidefinite matrices 
of rank n and calculate a probability density function with 
respect to volume elements of this manifold. Such a density 
was calculated for Wishart matrices in [22], and could be used 
for an alternative entropy definition. 

V. Joint Entropy 

Joint entropy is a widely used concept although it can be 
covered by the general concept of higher-dimensional entropy, 
because a pair of random variables (x.y) with x £ R Ml and 
y £ Kk 2 can also be interpreted as a single random variable 
on R m i+ m 2 , Thus, our concept of entropy automatically 
generalizes to more than one random variable. Using this in¬ 
terpretation, we obtain from (19) and (20) for an ?n-rectifiable 
pair of random variables (x, y) (i.e., /x(x,y) _1 <C s for 
an m-rectifiable set £) 

b m (x,y) = —E (Xjy) [ log 0(x, y) ( x , y)] (31) 

= -[ i°g0(x, y) (*> y) d/4 x i y) _1 ( a; 1 y) 

= ~ [ 0%,y)(x,y) i ogO% ) {x,y)dJ4? m \s{x,y) 

(32) 

with M = Mi + M 2 . However, there are still some questions 
to answer: 

• Assuming that x, y, and (x, y) are mi-, m 2 -, and m- 
rectifiable, respectively, is there a relationship between 
the quantities fi mi (x), fi m2 (y), and fi m (x, y) provided 
they exist? 


• Suppose we have an mi-rectifiable random variable x 
and an r?Z 2 -rectifiable random variable y on the same 
probability space. Which additional assumptions ensure 
that (x.y) is (mi + TO 2 )-rectifiable? 

• Conversely, suppose we have an m-rectifiable random 
variable (x, y). Which additional assumptions ensure that 
x and y are rectifiable? 

In what follows, we will provide answers to these questions 
under appropriate conditions on the involved random variables. 

One important shortcoming of Hausdorff measures (in con¬ 
trast to, e.g., the Lebesgue measure) is that the product of 
two Hausdorff measures is in general not again a Hausdorff 
measure. However, our definition of the support of a rectifiable 
measure in Definition 8 guarantees that the product of two 
Hausdorff measures restricted to the respective supports is 
again a Hausdorff measure. 

Lemma 27: Let x be mi -rectifiable with support £\, and let 
y be m 2 -rectifiable with support £ 2 . Then £\ is (mi+m. 2 )- 

rectifiable and 

^ mi+m2 k x£ 2 = ^ mi k x Jk" 2 k . (33) 

Proof: According to Definition 11, we have £\ = 

Ufcer tfk(Ak) and £ 2 = Lkw 9k(Bk) where, for k £ N, A k 
and B k are bounded Borel sets and f k and g k are Lipschitz 
functions that are one-to-one on A k and B k , respectively. By 
[20, Th. 15.1], the sets f k (A k ) and g k (B k ) are also Borel sets 
and, thus, [18, Th. 3.2.23] implies ^ mi+m2 | f k (A k )x gk {B k ) = 
jr m '\ fUAk) x J^ m2 \g k (B k )- The result (33) then follows by 
the er-additivity of Hausdorff measures. ■ 

A. Joint Entropy for Independent Random Variables 

We start our investigation of joint entropy with independent 
random variables. In this case, it turns out that the m- 
dimensional entropy is additive. 

Theorem 28: Let x: f2 —> R Ml and y: U —> Kk 2 be inde¬ 
pendent random variables on a probability space 
Furthermore, let x be mi-rectifiable with support £\ and 
let y be m 2 -rectifiable with support £ 2 . Then the following 
properties hold: 

1) The random variable (x, y): U —> R m i + m 2 is (mi+m. 2 )- 
rectifiable. 

2) The (mi + m 2 )-dimensional Hausdorff density of (x, y) 
satisfies 

(*, y) = C 1 (*) ( y) (34) 

-almost everywhere. 

3) The set £\ x £2 is (mi + m 2 )-rectifiable and satisfies 
m( x , y) -1 “C k° mi+m2 kx£ 2 - 

4) If fi mi (x) and ft m2 (y) are finite, then the (mi + tri¬ 
dimensional entropy of the random variable (x, y) is given 
by 

(x, y) = [) mi (x) + ft" 12 (y). 

Proof: See Appendix D. ■ 

A corollary of Theorem 28 is a result for finite sequences 
of independent random variables. Such sequences will be 
important for our discussion of typical sets in Section VIII. 
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Corollary 29: Let Xi :n = (xi,...,x n ) be a finite se¬ 
quence of independent random variables, where x, G R Mi , 
i £ {l,...,n} is ? 7 ii-rectifiable with support £, and m*- 
dimensional Hausdorff density Of". Then xi :n is an m- 
rectifiable random variable on R M , where m = Y^ii =1 m i 
and M = Y^i=i Mu and the set £ = £\ x • • • x £ n is m- 
rectifiable and satisfies /x( x i : „) _1 <C Moreover, the 

?n-dimensional Hausdorff density of x | :n is given by 

n 

O l:») = IICW' 

i= 1 

Finally, if (x0 is finite for i G {1,..., n}, then 

n 

fT(xi:„) = £>"** fr). (35) 

i=l 

Proof: The corollary follows by inductively applying 
Theorem 28 to the two random variables (xi,..., x,_i) and 
Xj. ■ 

fi. Dependent Random Variables 

The case of dependent random variables is more involved. 
The rectifiability of x and y does not necessarily imply the 
rectifiability of (x,y) (which is expected, since the marginal 
distributions carry only a small part of the information carried 
by the joint distribution). In general, even for continuous 
random variables x and y, we cannot calculate the joint 
differential entropy h(x, y) from the mere knowledge of the 
differential entropies h(x) and h (y). However, it is always 
possible to bound the differential entropy according to [13, 
eq. (8.63)] 

h(x,y) <h(x) + h(y). (36) 

In general, no bound resembling (36) holds for our entropy 
definition. The following simple setting provides a counterex¬ 
ample. 

Example 30: We continue our example of a random variable 
on the unit circle (see Section IV-D) for the special case of a 
uniform distribution of z on [0, 27 t). From (24), we obtain 

h 1 (x) = Mz)=log(2 7 r). (37) 

We can now analyze the components 11 x and y of the random 
variable x = (x y) T = (cosz sinz) T . One can easily see that 
x is a continuous random variable and its probability density 
function is given by f x (x) = 1/ [tty/l — x 2 ). By symmetry, the 
same holds for y, i.e., f y (y) = l/(7ryl — y 2 ). Basic calculus 
then yields for the differential entropy of x and y 

Mx) = My) = log (f) • (38) 

Since x and y are continuous random variables, it follows from 
Theorem 23 that h 1 (x) = h(x) and (^(y) = h( y). Thus, 

f) 1 (x) + M(y) = 2 log ^0 < log(27r). 

Comparing with (37), we see that h 1 (x, y) > h 1 (x) + l} 1 (y). ■ 

11 To conform with the notation (x, y) used in our treatment of joint entropy, 
we change the component notation from (xi X 2 ) T to (x y) T . 


The reason for this seemingly unintuitive behavior of 
our entropy are the geometric properties of the projection 
p y : R. m i + m 2 —> R M U p y (x,y) = y, i.e., the projection of 
to the last M 2 components. Although p y is linear and 
has a Jacobian determinant J Py of 1 everywhere on R Mi +m 2 , 
things get more involved once we consider p y as a mapping 
between rectifiable sets and want to calculate the Jacobian 
determinant Jjf of the tangential differential of p y which 
maps an m-rectifiable set £ C R M i+ M2 to an ?n 2 -rectifiable 
set £2 C R Ma [18, Sec. 3.2.16]. In this setting, J Py is not 
necessarily constant and may also become zero. Thus, the 
marginalization of an m-dimensional Hausdorff density is not 
as easy as the marginalization of a probability density function. 
The following theorem shows how to marginalize Hausdorff 
densities and describes the implications for ??7-dimensional 
entropy. 

Theorem 31: Let (x, y) £ r m i+ m 2 an m-rectifiable 
random variable (to < M\ + M 2 ) with ?n-dimensional Haus¬ 
dorff density 0'f yj and support £. Furthermore, let £2 = 
P y (£) Q be ?ri 2 -rectifiable (m 2 < to, to 2 < M 2 ), 

J4f m2 (£ 2 ) < 00 , and J py > 0 Jf' m |f-almost everywhere. 
Then the following properties hold: 

1) The random variable y is to 2 -rectifiable. 

2) There exists a support £> C £, of y. 

3) The ?n 2 -dimensional Hausdorff density of y is given by 


0;r(y) = [ 

Jew 


Jp y ( x ,v) 




(39) 


Jif 1712 -almost everywhere, where = {x £ R Ml : 
(x,y) £ £}. 

4) An expression of the TO 2 -dimensional entropy of y is 
given by 


fl m2 (y) = - J 0& y) (at, y) log 0™ 2 (y) d y) 

(40) 

provided the integral on the right-hand side exists and is 
finite. 

Under the assumptions that £\ = p x (£) is toi- rectifiable 
(toi < to, toi < Mi), J(f mi (£i) < 00 , and > 0 J4f m \g- 
almost everywhere, analogous results hold for x. 

Proof: See Appendix E. ■ 

We will illustrate the main findings of Theorem 31 in the 
setting of Example 30. 

Example 32: As in Example 30, we consider (x, y) £ 
R 2 uniformly distributed on the unit circle <Si. By (23), 
0( xy ){x,y) = 1/(27t) M 11 -almost everywhere on Si. In 
Example 30, we already obtained h 1 (y) = log(7r/2) (there, we 
used the fact that y is a continuous random variable and that, 
by Theorem 23, h 1 (y) = My)). Let us now calculate h 1 (y) 
using Theorem 31. Note first that p y (<Si) = [—1,1], which 
is 1-rectifiable and satisfies 1,1]) = 2 < cxd. Next, 

we calculate the Jacobian determinant jf y {x,y). Consider 
an arbitrary point on the unit circle, which can always be 
expressed as (iyS — y 2 , ±y) with y £ [0,1], At that point, 
the projection p y restricted to the tangent space of S can be 
shown to amount to a multiplication by the factor \Jl — y 2 . 
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Thus, Jj^ 1 (± sjl — y 2 , ±y ) = y/l — y 2 . Hence, we obtain 
from (40) 


^(y) 

= / 9 l*,y)( x ’V) 

J tSi 


'<Si 

x log 
1 


Si 27r \Js[ v) \J 1 - y 2 


r ^x,y)( £ »y) 
As^ J Py(x,y) 


dJif 1 1 (x) ) d^f 7l (x,?/) 


= - / — !°g 

!°g( ]T 


2 tt 


cL^°(x) )dc^ 1 (x,2/) 


(>)_-j_ 

27T 


'Si 


1 

2tt 


£6<S} 


( y ) 


V 1 ~y' 2 


d d!f x {x,y) 


(6)_ 

27T 


'Si 


log 2 


27T 


log 


1 

7r|cos(0)| 


cUT 1 ^,?/) 


d(j> 


= log 


(41) 


Theorem 31, x is mi-rectifiable and y is m 2 -rectifiable. Thus, 
x and y are product-compatible. 

The setting of product-compatible random variables will be 
especially important for our discussion of mutual information 
in Section VII. However, already for joint entropy, we obtain 
some useful results. 

Theorem 34: Let x be an mi-rectifiable random variable on 
R Ml with support £\, and let y be an m 2 -rectifiable random 
variable on R M2 with support £i. Furthermore, let x and y 
be product-compatible. Denote by 6^ 1 y | m2 the (mi + tri¬ 
dimensional Hausdorff density of (x,y) and by £ C £ 1 x £2 
a support of (x, y). Then the following properties hold: 

1) The m 2 -dimensional Hausdorff density of y is given by 

or ( y) = [ o T rr (*- y) d ^ mi (*) 

J E\ 

almost everywhere. 

2) An expression of the m 2 -dimensional entropy of y is 
given by 


where (a) holds because is the counting measure and 
( b) holds because = {x € M : (x, y) £ Si} = 

{\/l — y 2 , — \/l — y 2 } contains two points for all y £ 
(—1,1). Note that our above result for h 1 (y) coincides with 
the result previously obtained in Example 30. ■ 

C. Product-Compatible Random Variables 

There are special settings in which ?n-dimensional entropy 
more closely matches the behavior we know from (differential) 
entropy. In these cases, the three random variables x, y, and 
(x, y) are rectifiable with “matching” dimensions, and we will 
see that an inequality similar to (36) holds. 

Definition 33: Let x be an m i-rectifiable random variable 
on R Ml with support £\, and let y be an m 2 -rectifiable random 
variable on R Ma with support £•>. The random variables x and 
y are called product-compatible if (x,y) is an (mi + 7712 )- 
rectifiable random variable on R Mi +m 2 _ 

It is easy to see that for product-compatible random vari¬ 
ables x and y, p(x, y) -1 <C J^ mi+rn2 \£ 1 x£ 2 - Thus, by 
Property 4 in Corollary 12, there exists a support £ C £ 1 x £ 2 - 

The most important part of Definition 33 is that the di¬ 
mensions of x and y add up to the joint dimension of (x,y). 
Note that this was not the case in Example 32, where x 
and y “shared” the dimension m = 1 of (x, y). A simple 
example of product-compatible random variables is the case 
of an mi -rectifiable random variable x and an independent m 2 - 
rectifiable random variable y. Indeed, by Theorem 28, (x, y) 
is (mi + m 2 ) -rectifiable. 

Another example of product-compatible random variables 
can be deduced from Theorem 31. Let (x.y) be (mi + 777 - 2 )- 
rectifiable. Assume that £2 = p y (£) C R Ma is m- 2 -rectifiable, 
^^{£ 2 ) < 00 , and Jp^ 0 ^ m |£-almost everywhere. Fur¬ 
thermore, assume that £\ = p x {£) C R Ml is m-i-rectifiable, 
Jff mi (£i) < 00 , and > 0 J^ m |£-almost everywhere. By 


fi m2 (y) = - / 

x log or (y) dJ£ ,mi+m2 (x, y) 

provided the integral on the right-hand side exists and is 
finite. 

Due to symmetry, analogous properties hold for Of' and 
h mi (x). 

Proof: The proof follows along the lines of the proof 
of Theorem 31 in Appendix E. However, due to the product- 
compatibility of x and y, one can use Fubini’s theorem in place 
of (110). ■ 

For product-compatible random variables, also the inequal¬ 
ity t) mi+m2 (x, y) < !) mi (x) + fi m2 (y) holds. However, the 
proof of this inequality will be much easier once we considered 
the mutual information between rectifiable random variables. 
Thus, we postpone a formal presentation of the inequality to 
Corollary 47 in Section VII. 

VI. Conditional Entropy 

In contrast to joint entropy, conditional entropy is a nontriv¬ 
ial extension of entropy. We would like to define the entropy 
for a random variable x on R Ml under the condition that a 
dependent random variable y on R Ma is known. For discrete 
and—under appropriate assumptions—for continuous random 
variables, the distribution of (x | y = y) is well defined and 
so is the associated entropy II (x | y = y) or differential 
entropy h(x \ y = y). Averaging over all y then results in 
the well-known definitions of conditional entropy H(x | y), 
involving only the probability mass functions P( x . y ) and p y , 
or of conditional differential entropy h(x \ y), involving only 
the probability density functions /( x , y ) and / y . Indeed, if x and 
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y are discrete random variables, we have 

H (* I y) = ^p y (%)i?( x |y = Vi) 

jew 

= - P(x,y)( a: »yj ) lo g 

i J€N 

'P(x, y)(x,y) 


P(x,y) (®»j J/j , 

PyiVj) 


= —E 


(x,y) 


log 


Py(y) 


(42) 


and, if x and y are continuous random variables, we have 


Mx I y) = 


/ /y(y)M x ly = y)dy 

Jl M 2 

,y){x,y) 1° 

/(x, y)(x,y) 


f /(x,y) (*, y) log f) d(tc, y) 

r m i+" 2 V Jy(y) 


= -E 


( x ,y) 


log 


/y(y) 


(43) 


A straightforward generalization to rectifiable measures would 
be to mimic the right-hand sides of (42) and (43) using 
Hausdorff densities. However, it will turn out that this naive 
approach is only partly correct: due to the geometric subtleties 
of the projection discussed in Section V-B, we may have to 
include a correction term that reflects the geometry of the 
conditioning process. 


A. Conditional Probability 

For general random variables x and y, we recall the concept 
of conditional probabilities, which can be summarized as 
follows (a detailed account can be found in [24, Ch. 5]): For 
a pair of random variables (x, y) on M Ml+M2 , there exists a 
regular conditional probability Pr{x G A | y = y}, i.e., for 
each measurable set A C R Ml , the function y H > Pr{x G 
A | y = y} is measurable and Pr{x € ■ | y = y} defines 
a probability measure for each y G R M ' 2 . Furthermore, the 
regular conditional probability Pr{x £ A | y = y} satisfies 


Pr{(x,y) e 4ix 4 2 } = / Pr{x G Eli | y = y} d/ry 1 (y). 
Ja 2 

(44) 

The regular conditional probability Pr{x G A | y = y\ 
involved in (44) is not unique. Nevertheless, we can still use 
(44) in a definition of conditional entropy because any version 
of the regular conditional probability satisfies (44). For the 
remainder of this section, we consider a fixed version of the 
regular conditional probability Pr{x G A | y = y\. 


B. Definition of Conditional Entropy 

In order to define a conditional entropy b m-m2 (x|y), we 
first show that Pr{x G • | y = y} is a rectifiable measure. 
The next theorem establishes sufficient conditions such that 
Pr{x G • | y = y} is rectifiable for almost every y. As before, 
we denote by p y : R m i+m 2 R Ma t he projection of R Ml+M2 
to the last M 2 components, i.e., p y (a;, y) = y. 

Theorem 35: Let (x, y) be an TO-rectifiable random variable 
on R m i+m 2 w ith TO-dimensional Hausdorff density 0^ y ^ 
and support Z. Furthermore, let Z 2 = p y (£) C R Ma be 
?ri 2 -rectifiable (m 2 < m, m 2 < M 2 , m — m 2 < Mf), 


^ m2 (£ 2 ) < 00 , and J> 0 JiZ m \£-almost everywhere. 
Then the following properties hold: 

1) The measure Pr{x G • | y = y} is (m — ?7i2)-rectifiable 
for Jjf ™ 2 1 £2 -almost every y G R M2 , where Z 2 C 62 is a 
support 12 of y. 

2) The (to — 777 - 2 )-dimensional Hausdorff density of the 
measure Pr{x G • | y = y} is given by 


nm—rri 2 
°Pr{xe- | y=2/} 


(*) 


%yficpy) 
Jp y {x,y)o? 2 (y) 


(45) 


J$fm m 2 | f(M) . a i mos t everywhere, for \g 2 -almost 

every y G R M2 . Here, as before, Z^ = {x G R Ml : 
(x,y) G £}. 

Proof: See Appendix F. ■ 

As for joint entropy, the case of product-compatible random 
variables (see Definition 33) is of special interest and results 
in a more intuitive characterization of the Hausdorff density 
of Pr{x G • | y = y}. 

Theorem 36: Let x be an to 1 -rectiliable random variable on 
R Ml with support Z\, and let y be an TO 2 -rectifiable random 
variable on R Ma with support £- 2 - Furthermore, let x and y be 
product-compatible. Then the following properties hold: 

1) The measure Pr{x G • | y = y} is toi -rectifiable for 
J^" 12 1 g 2 -almost every y G R Ma . 

2) The TOi-dimensional Hausdorff density of Pr{x G • | y = 
y} is given by 


arm 

^Pr{xe- | y=3/} 


( x ) 


^r^y) 

e? 2 (y) 


(46) 


| £l -almost everywhere, for Jtf ™ 2 |g 2 -almost every 

y G R Ma . 

Proof: The proof follows along the lines of the proof 
of Theorem 35 in Appendix F. However, due to the product- 
compatibility of x and y, one can use Fubini’s theorem in place 
of (110). ■ 

Note that Theorems 35 and 36 hold for any version of the 
regular conditional probability Pr{x G A \ y = y}. However, 
for different versions, the statement “for Jif 1712 £ 2 -almost every 
y G R M2 ” may refer to different sets of Jff™ 2 \s 2 -measure 
zero; e.g., (45) may hold for different y G R Ms . Thus, 
results that are independent of the version of the regular 
conditional probability can only be obtained if we can avoid 
these “almost everywhere”-statements. To this end, we will 
define conditional entropy as an expectation over y. 

Definition 37: Let (x, y) be an TO-rectifiable random variable 
on R Mi + M 2 sugh that y is TO 2 -rectifiable with m 2 -dimensional 
Hausdorff density O f 2 and support Z 2 . The conditional entropy 
of x given y is defined as 13 


fl m " m2 (x|y) 

= ~LS Hv) 

x log 0" 2 , y=j/} (x) djr m ~ m2 (x) dJf m2 (y) (47) 


12 By Theorem 31, the random variable y is m 2 -rectifiable with Hausdorff 
density 6™ 2 (given by (39)) and some support 82 C £^ 2 - 

13 The inner integral in (47) can be intuitively interpreted as an entropy 
f) m_m 2 (x | y = y). However, such an entropy is not well defined in general 
and depends on the choice of the conditional probability. 
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provided the right-hand side of (47) exists and coincides for all 
versions of the regular conditional probability Pr{x £ A | y = 

y}- 

Remark 38: For independent random variables x and y, in¬ 
serting (34) into (46) implies that ^p r { xe . | y = y }( x ) = ^ 1 ( x )- 
Thus, (47) reduces to t) mi (x | y) = t) mi (x). 

The following theorem gives a characterization of condi¬ 
tional entropy and sufficient conditions for (47) to be well- 
defined in the sense that the right-hand side of (47) coin¬ 
cides for all versions of the regular conditional probability 
Pr{x £ A | y = y}. 

Theorem 39: Let (x, y) be an ?n-rectifiable random variable 
on K m i+m 2 w j t | 1 ^-dimensional Hausdorff density and 
support £. Furthermore, let £2 = p y (£) be ?7i2-rectifiable, 
Jif m2 (£ 2 ) < 00 , and J^ > 0 J4f m \s-almost everywhere. 
Then 

r - m2(x | y) = _E (x , y) [log ( \m 2 ( ( X y) y) ) 

+ E (x ,y)[log J p £ y (x,y)] (48) 

provided the right-hand side of (48) exists and is finite. 

Proof: See Appendix G. ■ 

Note the difference between (48) and the expressions (42) 
and (43) of H(x\ y) and h(x | y), respectively: in the case of 
rectifiable random variables, we generally have to include the 
geometric correction term E( x y ) [log Jjf (x, y)]. However, we 
will show next that, in the special case of product-compatible 
rectifiable random variables, this correction term does not 
appear. 

Theorem 40: Let the mi-rectifiable random variable x on 
R Ml and the TO 2 -rectifiable random variable y on R Ma be 
product-compatible. Then 

0 mi+m 2(x w 

f) mi (x | y) = -E (x , y) [log (x ’^ 2(y) jj (49) 

provided the right-hand side of (49) exists and is finite. 

Proof: The proof follows along the lines of the proof of 
Theorem 39 in Appendix G. However, due to the product- 
compatibility of x and y, one can use Fubini’s theorem in 
place of (110). ■ 

C. Chain Rule for Rectifiable Random Variables 

As in the case of entropy and differential entropy, we can 
give a chain rule for m-dimensional entropy. 

Theorem 41: Let (x, y) be an ?n-rectifiable random variable 
on R m i + m 2 w ith m-dimensional Hausdorff density 0” xy ) and 
support £. Furthermore, let £2 = p y (£) be ?ri 2 -rectifiable, 
2 ) < 00 , and Jjf > 0 M >m \£-almost everywhere. 

Then 

b m (x,y) = f} m2 (y) + r~ m2 (x | y) - E (x , y) [log j£(x,y)] 

(50) 

provided the corresponding integrals exist and are finite. 


Proof: By the definition of (f'" (x. y) in (31) and the 
definition of f) m2 (y) in (19), we have 

b m (x,y) - f) m2 (y) +E (Xjy) [log Jjf y (x,y)] 

= -E (x>y) [ log 6>£ y) (x, y)] + E y [ log (y)] 

+ E( x>y) [log Jp y (x,y)] 

/ 0471 y \ \ 

= ~ E (x,y) log y gm 2(y) j + E (x,y) [ log Jp, (x, y)] • 

(51) 

Because we assumed in the theorem that the integrals corre¬ 
sponding to the terms on the left-hand side of (51) are finite, 
the right-hand side of (51) is also finite. By (48), the right-hand 
side of (51) equals f) m-m2 (x | y). Thus, (50) holds. ■ 

Next, we continue Examples 30 and 32 from Section V-B. 
We will see that the geometric correction term in the chain 
rule, E( XiY ) [ log Jjf (x, y)], is indeed necessary. 

Example 42: As in Examples 30 and 32, we consider 
(x, y) £ R 2 uniformly distributed on the unit circle S\, 
i.e., y ) (x, y) = 1/(27t) Jf^-almost everywhere on iS|. 
According to (41), 

ll 1 (y) = log([|) (52) 

and according to (37), 

fl 1 (x, y) = log(27r). (53) 

To calculate the conditional entropy f)°(x|y) (note that m — 
m 2 = 1 — 1 = 0), we consider the regular conditional 
probability Pr{x £ A \ y = y}. It is easy to see that one 
possible version of Pr{x £ A \ y = y} is the following: for 
y £ (—1,1), Pr{x = x | y = y} = 1/2 for x = ± \J\ — y 1 and 
Pr{x £„4|y = r/} = 0if± sj\ — y 2 £ A. The probabilities 
for |y| > 1 are irrelevant because Pr{y ^ (—1,1)} = 0. 
Hence, by (47), we obtain 

h°(x|y) 

= - [ e](y) j _ 1 -\og 1 -d^{x)AJ4f\y) 

J (— 1 , 1 ) J{±\/i-v 2 } z 2 

7(_i,i) 2 

= log 2 . (54) 

This differs from ^(x,y) — (^(y) = log(27r) — log(7r/2), and 
therefore the conjecture that there holds a chain rule without 
a correction term is wrong. To calculate the correction term, 
which according to (50) is given by E( x>y ) [ log J^ 1 (x, y)], we 
recall from Example 32 that J^ 1 (± \f 1 — y 1 , ±y) = y/l — y 2 
or, more conveniently, J^ 1 (cos <t>, sin/)) = |cos</>|. Thus, we 
obtain 

E (x,y) [ log jf/ (x, y)] = f ^-logJ* 1 (x,y)dJlf 1 {x,y) 

J Si 
, 27 . 1 

= / — log|cos0| dcj) 

= — log 2 . (55) 
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We finally verify that (55) is consistent with the chain rule 
(50). Starting from (53), we obtain 


f) 1 (x, y) = log(27r) 

= log + log2- (-log2) 

= ^(y) + f)° (x | y) - E (x , y ) [ log J^ 1 (x, y)] 

where the final expansion is obtained by using (52), (54), and 
(55). ■ 

Example 42 also provides a counterexample to the rule 
“conditioning does not increase entropy,” which holds for the 
entropy of discrete random variables and the differential en¬ 
tropy of continuous random variables. Indeed, comparing (38) 
and (54), we see that for the components of a uniform distri¬ 
bution on the unit circle, we have h x (x) < h°(x | y). However, 
as we will see in Corollary 47 in Section VII, this is only 
due to a “reduction of dimensions”: if x and y are product- 
compatible, which implies that t) mi (x) and ( x | y) are 

of the same dimension m i = m— m 2 , conditioning will indeed 
not increase entropy, i.e., f) 1 ™ 1 (x | y) < f) mi (x). Also the chain 
rule (50) reduces to its traditional form, as stated next. 

Theorem 43: Let the mi-rectifiable random variable x on 
R Ml and the m 2 -rectifiable random variable y on R M2 be 
product-compatible. Then 

b mi+m2 (x. y ) = h™ 2 (y) + t) mi (x | y) (56) 

provided the entropies f) mi+m 2 (x,y) and t) m 2 (y) exist and are 
finite. 

Proof: By the definition of b mi+m2 (x,y) in (31) and the 
definition of b m2 (y) in (19), we have 


f) mi+m2 (x,y) - b m2 (y) 

= -E(x, y) [ log 6^+ m2 (x, y)] + E y [ log 0™ 2 (y)] 

qmi + 

Tx.y) 


= -E 


(x,y) 


log 


G.t m2 (x,y) 


o? 2 (y) 


(57) 


By (49), the right-hand side of (57) equals h mi ( x l y). Thus, 
(56) holds. ■ 

Using an induction argument, we can extend the chain 
rule (56) to a sequence of random variables. 

Corollary 44: Let xi :n = (xi,..., x„) be a sequence of ran¬ 
dom variables where each x^ G R Mi is mi-rectifiable. Assume 
that Xi : j_i and X; are product-compatible for i G {2,... ,n}. 
Then 

n 

f) ro (x 1: „) = b mi (xi) + ^2 &"**(*< I Xl:i_l) (58) 

i= 2 


VII. Mutual Information 


The basic definition of mutual information is for dis¬ 
crete random variables x and y with probability mass func¬ 
tions p x (xi) and py(yj) and joint probability mass function 
j )( xy , (xi. y f). The mutual information between x and y is 
given by [13, eq. (2.28)] 


I Ay) = ^2p( x< y){xi,yj)\og 
i,3 


f P(x,y)Aii Uj ) \ 

\Px(Xi)Py{yj)) 


(59) 


However, mutual information is also defined between arbitrary 
random variables x and y on a common probability space. 
This definition is based on (59) and quantizations [x]q and 
[y]<K [13, eq. (8.54)]. We recall from Section II-A that for 
a measurable, finite partition Q = {Mi,..., An} of R Ml 
(i.e., R Ml = (J^j Ai with A t G Q mutually disjoint and 
measurable), the quantization [x]q G {1,..., N} is defined as 
the discrete random variable with probability mass function 
P[x] a (0 = Pr{[x]n =i} = Pr{x G A} for i G (1, 

Definition 45 ([13, eq. (8.54) 7): Let x: f l -A M Ml and 
y: D, —> R Ma be random variables on a common probability 
space (f 2, 6,/r). The mutual information between x and y is 
defined as 

/(x;y) = sup /([x]q; [y] OT ) 

Q,9t 


where the supremum is taken over all measurable, finite 
partitions Q of R Ml and 91 of R M2 . 

The Gelfand-Yaglom-Perez theorem [25, Lem. 5.2.3] pro¬ 
vides an expression of mutual information in terms of Radon- 
Nikodym derivatives: for random variables x: D —> R Ml and 
y: D —> R Mz on a common probability space (fl, 6, p), 


/(x;y) = 


/E m 1+ m 2 


log( 


dft( x , y) 


d(px _1 x py 1 


-( x,V )) 


x dp(x, y) 1 (x,y) (60) 


if p(x, y) 1 <C px 1 x py 1 , and 


7(x;y) = oo (61) 

if p(x, y) _1 < px - 1 x py 1 . 

For the special cases of discrete and continuous random 
variables, there exist expressions of mutual information in 
terms of entropy and differential entropy, respectively. We will 
extend these expressions to the case of rectifiable random vari¬ 
ables. The resulting generalization will involve the entropies 
fl mi (x), b m2 (y), and f} m (x,y). 

Theorem 46: Let x be an mi-rectifiable random variable 
with support E\ C R Ml , let y be an m 2 -rectifiable random 
variable with support £2 Q R M2 , and let (x, y) be m-rectifiable 
with support £ C £ 1 x £ 2 - The mutual information /(x;y) 
satisfies: 

1) If x and y are product-compatible (i.e., m = mi + m 2 ), 
then 


with to = J^"=i m i’ provided the corresponding integrals exist 
and are finite. 

We note that, consistently with Remark 38, (35) is a special 
case of (58). 


I Ay) = J 0(x,y ){x,y) 

/ Q™' y') \ 

x 1 o 8 ( c ! (»W ‘Hy) ) djem{X - y) - m) 
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Furthermore, 

/(x; y) = b mi (x) + fl m2 (y) - fl m (x, y) (63) 

and 

/(x; y) = (x) - (x | y) = b™ 2 (y) - ^ (y | x) 

(64) 

provided the entropies f) mi (x), t) m2 (y), and b m (x, y) 

exist and are finite. 

2) If m < mi + ?n 2 , then 7(x; y) = oo. 

Proof: See Appendix H. ■ 

In Theorem 46, the case m < mi + 777.2 can be interpreted as 
x and y “sharing” at least one dimension. In a communication 
scenario, this would imply that it is possible to reconstruct an 
at least one-dimensional component of x from y (and, also, 
to reconstruct an at least one-dimensional component of y 
from x). Thus, an infinite amount of information could be 
transmitted over a channel x —> y (or y —> x). This is 
consistent with our result that /(x; y) = 00 . 

A corollary of Theorem 46 states that for product-compati¬ 
ble random variables, we can upper-bound the joint entropy by 
the sum of the individual entropies and prove that conditioning 
does not increase entropy. 

Corollary 47: Let the mi-rectifiable random variable x on 
R Ml and the m 2 -rectifiable random variable y on R Ma be 
product-compatible. Then 

f )mi +m 2 ( X) y) < f) mi (x) + t) m2 (y) (65) 

and 

f) mi (x | y) < b mi (x) (66) 

provided the entropies f) mi (x), ff m2 (y), and f) mi+Tra2 (x, y) exist 
and are finite. 

Proof: The inequality (65) follows from (63) and the 
nonnegativity of mutual information. Similarly, (66) follows 
from (64) and the nonnegativity of mutual information. ■ 

VIII. Asymptotic Equipartition Property 

Similar to classical entropy and differential entropy, the tri¬ 
dimensional entropy f} m (x) satisfies an asymptotic equipar¬ 
tition property (AEP). Let us consider a sequence x 1:n = 
(x l5 ... ,x„) of i.i.d. random variables x,. Our main findings 
are similar to the discrete and continuous cases: based on 
fi m (x), we define sets A^ n> of typical sequences xi :n and show 
that, for sufficiently large n, a random sequence xi ;n belongs 
to A.4 with probability arbitrarily close to one. Furthermore, 
we obtain upper and lower bounds on the size of A 4 given 
by e n ( x )+ e ) and (1 — S)e n ^ respectively. In the case 

of classical entropy and differential entropy, these properties 
are useful in the proof of various coding theorems because 
they allow us to consider only typical sequences. 

Our analysis follows the steps in [13, Sec. 8.2]. However, 
whereas in the discrete case the size of a set of sequences 
Xi :n is measured by its cardinality and in the continuous case 
by its Lebesgue measure, in the present case of ?77-rectifiable 
random variables x,, we resort to the Hausdorff measure. 

Lemma 48: Let xi :n = (xi,..., x n ) be a sequence of i.i.d. 
?77-rectifiable random variables x^ on R M , where each x, 


has 777-dimensional Hausdorff density O f and ?7i-dimensional 
entropy f) m (x). The random variable —(I/ 77 ) X]"=i log Of{xf) 
converges to f) m ( x ) in probability, i.e., for any £ > 0 

lim Pr / - -V log OftxA - l} m (x) > e\ = 0 . 

n-y 00 I n z —' 

v i—1 ' 

Proof: By (19), we have f) m (x) = —E x [log0™(x)], 
and by the weak law of large numbers, the sample mean 
— (I/ 77 ) Y^i=i l°gH™( x i) converges in probability to the ex¬ 
pectation —E x [ log Of (x)]. ■ 

We can define typical sets in the usual way [13, Sec. 8.2], 

Definition 49: Let x be an m-rectifiable random variable on 
R m with support £ and 777-dimensional Hausdorff density Of. 
For £ > 0 and 77 £ N, the £-typical set C R" m is defined 
as 

A ( e n) ± (x 1:n €£ n : -- J2\o g 0f(xi)-t, m (x) < £ |. 

^ n i=l ' 

The AEP for sequences of m-rectifiable random variables 
is expressed by the following central result. 

Theorem 50: Let Xi :n = (xi,...,x n ) be a sequence of 
i.i.d. m-rectifiable random variables x, on R M , where each x, 
has ?77-dimensional Hausdorff density Of, support £, and 777 - 
dimensional entropy b m (x). Then the typical set A/' 1 satisfies 
the following properties. 

1) For 5 > 0 and 77 sufficiently large, 

Pr{xi :n € 4 n) } > 1 - cL 

2) For all 77 £ N, 

J^ nm (Ai n ^) < e"( f,m ( x ) +£ ). 

3) For 5 > 0 and 71 sufficiently large, 

Jf nm (A ( e n) ) > (1 - S)e n< - l,m(x) ~ e) . 

Proof: The proof is similar to that in the continuous case 
[13, Th. 8.2.2], however with the Lebesgue measure replaced 
by the Hausdorff measure. ■ 

IX. Entropy Bounds on Expected Codeword 
Length 

A well-known result for discrete random variables is a 
connection between the minimal expected codeword length 
of an instantaneous source code and the entropy of the 
random variable [13, Th. 5.4.1], More specifically, let x be 
a discrete random variable on R M with possible realizations 
{xi : i £ I}. In variable-length source coding, a one-to- 
one function /: {Xi : i £ 1} —> {0,1}*, where {0,1}* 
denotes the set of all finite-length binary sequences, is used to 
represent each realization x t by a finite-length binary sequence 
Si = f(xf). This code is instantaneous (or prefix free) if 
no f(xf) coincides with the first bits of another f(xj). The 
expected binary codeword length is defined as 

i/(x)=E x [((/(x))] 

where £(s) denotes the length of a binary sequence s £ 
{0,1}*. The minimal expected binary codeword length L*(x) 
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is defined as the minimum of Lf (x) over the set of all possible 
instantaneous codes /. By [13, Th. 5.4.1], L*(x) satisfies 14 

.ff(x) lde < L*(x) < H(x) lde + 1. (67) 

A. Expected Codeword Length of an Integer-Dimensional 
Random Variable 

For a nondiscrete m-rectifiable random variable x (i.e., 
to > 1), a one-to-one code of finite expected codeword length 
does not exist. However, quantizations of x can be encoded 
using finite-length binary sequences. We will present results 
for the minimal expected codeword length of constrained 
quantizations of x. 

Definition 51: Let £ C R A/ be an m-rectifiable set. Fur¬ 
thermore, let 0 = {Ai ,... ,An} be a finite ^“-measurable 
partition of £, i.e., all sets A, are mutually disjoint and J4f m - 
measurable, and (J^ A, = £. The partition £3 is said to be an 
(to, S)-partition of £ if < 6 for all i £ {1, ... ,7V}. 

The set of all (to, ^-partitions of £ is denoted 6 . 

Note that the definition of an (to, 6 ) -partition of an to- 
rectifiable set £ does not involve a distortion function. On the 
one hand, this is convenient because we do not have to argue 
about a good distortion measure. On the other hand, the points 
in a set A, of a partition Q £ 'p}, \ are not necessarily “close” 
to each other; in fact, Ai is not even necessarily connected. 
Thus, although the partitions in 'P' s consist of measure- 
theoretically small sets, these sets might be considered large 
in terms of specific distortion measures. 

In what follows, we will consider the quantized random 
variable [x]q for 0 £ 'P„, (r- We recall that [x]q is the discrete 
random variable such that Pr{[x]^ = i} Pr{x £ A, } 
for i £ {1,...,7V}. Due to the interpretation of f)“(x) as 
a generalized entropy (cf. Remark 19), we can use [12, eq. 
(1.8)] to obtain the following result. 

Lemma 52: Let x be an m-rectifiable random variable, i.e., 
/rx _1 < A£ >m \ £ for an ?n-rectifiable set £ C R M , with to > 1 
and M‘ m {£) < oo. Let 'Pm/oo denote the set of all finite, 
^“-measurable partitions of £. Then 

f,“(x) 

= il4f (~ V/rx-^^logf (68) 

= in .f V*^ 1 {A)\ogJif rn \£{A)] . 

Q&'VXA V / 

(69) 

Proof: See Appendix I. ■ 

The terms in (69) give an interesting interpretation of m- 
dimensional entropy. Looking for a quantization that min¬ 
imizes the first term, .Z7 ([x]q), corresponds to minimizing 
the amount of data required to represent this quantization. 
Of course, the minimum is simply obtained for the partition 
Q = {£}, which gives fL([x]rj) = 0. But in (69), we also 
have an additional term that penalizes a bad “resolution” of 

14 The factor lde appears because we defined entropy using the natural 
logarithm. 


the quantization: if the quantized random variable [x]rj is with 
high probability—corresponding to px~ 1 (A) being large—in 
a large quantization set A, then this is penalized by the term 
^x' 4 (^l) logJff m \ e (A). Thus, (69) shows that ?n-dimensional 
entropy can be interpreted in terms of a tradeoff between fine 
resolution and efficient representation. 

We now turn to a generalization of (67) to rectifiable random 
variables. 

Theorem 53: Let x be an m-rectifiable random variable, i.e., 
p,x~ l < A£ >m \ £ for an ?77.-rectifiable set £ C R M , with m > 1 
and J4f m (£) < oo. For any Q £ the minimal expected 

binary codeword length of the quantized random variable [x] q 
satisfies 

L*([x] fl ) > b“(x)lde-ld(5. (70) 

Furthermore, for each £ > 0, there exists 6 e > 0 such that the 
following holds: for each 6 £ (0,(L), there exists a partition 

0,5 £ such that 

L*([x]q 5 ) < ()“(x)Ide — Id<5 + 1 + £. (71) 

Proof: See Appendix J. We note that the proof is based 
on (67) and the expression of b“(x) given in (69). ■ 

The lower bound (70) shows the following: if we want 
a quantization Q of x with good resolution (in the sense 
that J / if m (A) < 6 for all A £ 0), then we have to use at 
least l)“(x) Id e — Id 5 bits to represent this quantized random 
variable using an instantaneous code. However, by the upper 
bound (71), we know that for a sufficiently fine resolution 
(i.e., 6 < S e ), that resolution 5 can be achieved by using at 
most 1 + £ additional bits (in addition to the lower bound 
fi“(x) lde — Id(5). 

B. Expected Codeword Length of Sequences of Integer-Dimen¬ 
sional Random Variables 

We will now apply Theorem 53 to sequences of i.i.d. 
random variables. To this end, we consider quantizations of 
an entire sequence, [xi : „]q = [(xy,..., x„)]q with 15 0 £ 

^nrn]^- We den0te b Y 

i;([xi:„]o) = Z/ * ([X1: " ]q) (72) 

n 

the minimal expected binary codeword length per source 
symbol. 

Corollary 54: Let xi :n = (xi,..,,x n ) be a sequence 
of i.i.d. ?n-rectifiable random variables (to > 1) on R M 
with m-dimensional entropy (} r "(x) and support £ satisfying 
J4f m {£) < oo. Then, for each £ > 0, there exists S £ > 0 such 

that the following holds: for each S £ (0, <5 e ), there exists a 

(£ n ) 

partition 0 £ V sucb that the minimal expected binary 
codeword length per source symbol satisfies 

b“(x) Id e - Id 5 < L;([x 1:n ]a) < fj m (x) Id e - Id <5 + 

n 

(73) 

15 We choose partitions H of resolution 8 n , i.e., the sets A G H satisfy 
Jt? nrn (A) < 8 n . This choice is made for consistency with the case of 
partitions H of £ n that are constructed as products of sets A% in Hi E V 
More specifically, for A = Ai x • • • x A n with Ai E Oi, we have 
Jtf >rn (Ai) < 8 and j 4 ? nrn (A) < 8 n and the sets A cover £ n , i.e., 
£J A {^4 = Al x ... x An ■■ Ai £ £2i} e V ( n £ ^n- 
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Proof: By Corollary 29, the random variable X\ :n is nm- 
rectifiable with p(xi :n ) _1 <C Jff ’™and nm-dimensional 
entropy f)" m (xi ;n ) = nf) m (x). Thus, by Theorem 53, there 
exists S e > 0 such that the following holds: 

~ ^ /en\ 

(*) For all 5 € (0, <L), there exists a partition £3 £ Ctr ~ 

x 7 nm,o 

such that 


nl} m (x) lde — Id 6 < C*([xi :rl ] £ j) 

< nh m (x) Id e — Id 5 + 1 + e. 


Define 6 £ = <)V n and let <5 £ (0, <5 e ). We have that <5 £ (0, S £ ) 
is equivalent to 5 n £ (0, S e ). Thus, by (*) for the specific case 


S = S n , there exists a partition £2 £ 


(£ n ) 
nm.8 n 


such that 


nfj m (x)lde-ld<5 n < L*([x 1:n ] fl ) 

<nl} m (x) Id e — Id (5 n + 1 + £. 


Dividing by n and using (72) gives (73). ■ 

Corollary 54 shows that the upper bound on the expected 
codeword length per source symbol becomes closer to the 
lower bound b m (x) lde — Id<5 if we are allowed to quantize 
and code entire sequences. However, note that using the 
quantization £2 £ of the joint random variable xi :n , 

it is not guaranteed that we can reconstruct each x, to within 
a set A, satisfying Jff m (Ai) < S. All we know is that each 
A £ £2 satisfies Jt? nm (A) < S n , i.e., the overall resolution 
of the sequence is good, but the resolution of each individual 
source symbol is not necessarily good too. 


X. Shannon Lower Bound for 
Integer-Dimensional Sources 


As a second application of the proposed entropy definition, 
we present a lower bound on the rate-distortion (RD) function 
of integer-dimensional sources. The RD function for a source 
x and a distortion function d(-, •) is defined as [4, eq. (4.1.3)] 


R(D) 


inf /(x;y) 

E 0,y) [d(x,y)]<Z? 


for D > 0, where the constrained infimum is taken over all 
joint probability distributions of (x,y) with the given proba¬ 
bility distribution of x as the first marginal. We will consider 
throughout this section a source random variable x on R A/ and 
a translation invariant distortion function d(-, •) on R M x R M , 
i.e., d(x,y) = d(x — y, 0) for all x,y £ R A/ . Furthermore, 
we assume that d(-,-) satisfies inf y€R M d(x,y) = 0 for each 
x £ R m . We also assume that there exist D > 0 such that 
R(D) is finite, and we denote by Dq the infimum of these D. 
Finally, we assume that there exists a finite set B C R M such 
that E x [ minyge d(x, y)] < oo. This assumption guarantees 
that there exists a finite quantization of x with bounded 
expected distortion. Under these standard assumptions, we 
have the following characterization of the RD function [26, 
Th. 2.3]: For each D > D 0 , 


where the second maximization is with respect to all func¬ 
tions 16 a s : R M -A- (0, oo) satisfying 

E x [a s (x)e- Sd(x ^] < 1 (75) 

for each y £ R M . 

A. Shannon Lower Bound 

The most common form of the traditional Shannon lower 
bound [4, Sec. 4.3] for a discrete source x is the following 
inequality 

R(D ) > H(x) — maxfT(w) (76) 

where the maximum is taken over all discrete random variables 
w whose expected distortion relative to 0 is equal to D, i.e., 
E w [d(w.O)] = D. An important aspect of the bound (76) is 
that the contribution of the source x and the contribution of the 
distortion function and distortion D become separated. 

For a fixed distortion function and a given distortion, we can 
calculate max H (w) and then use the bound (76) for different 
sources x simply by calculating their entropy H(x). 

For a continuous random variable x on R A/ , a bound similar 
to (76) can also be derived under certain assumptions. How¬ 
ever, it is more convenient to state the continuous Shannon 
lower bound in the following parametric form (i.e., involving 
a parameter s > 0) [4, Sec. 4.6] 



R(D) > h(x) — sD — log 7 (s) 

(77) 

where 


700 =/ e ~ sd ( x ’°) dJf M (x) 

J R M 

(78) 


and (77) holds for all s > 0. The right-hand side of (77) 
can be maximized with respect to s, and it turns out that [4, 
Lem. 4.6.2] 


min (sD + log 7 (s)) = maxft(w) 

s>0 

where the maximum is taken over all continuous random 
variables w such that E w [d(w,0)] = D. This results again 
in the simple formula (cf. (76)) 

R(D) > h(x) — maxft(w). 

Because the parametric bound (77) is more convenient in most 
cases and already allows us to separate the source from the 
distortion, we will concentrate on a generalization of (77) 
to rectifiable random variables. To this end, we will use the 
characterization of the RD function in (74) with a specific 
choice of the function a s . 

Theorem 55: The RD function of an m-rectifiable random 
variable x on R M with support 8 is lower bounded by 

R(D) > R slb (D, s ) ^ f] m (x) -sD- log 7 (s) (79) 

for each s > 0, where 

7 (s)= sup [ e~ sd ( x ’ y) dJt° m (x), s> 0. (80) 

y eR M J £ 


R(D) = maxmax(— sD + E x [log a s (x)l) 
s>0 <*„(•) ' 


( 74 ) 


16 Although in [26, Th. 2.3] a s (aj) > 1 is assumed, (74) also holds for 
a s (x) > 0 because of [26, eq. (1.23)]. 
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Proof: We start by noting that (80) implies 


J v) d< 7 (s) (81) 


for all y £ R M . Let s > 0 be fixed. By (74), 

R{D) > —sD + E x [loga s (x)] 
for every function a s satisfying (75). We have (cf. (15)) 


(82) 


Ex 


1 


6>™(x)7(s) 


0 -sd(x,y) 


e - sd(x -y ) 0™( x )djf’ m ( x ) 


7( 


(M) 7 ( s ) 


d™(x)y(s) 

1 r e -sd( X ,y) d 

(s) Js 


7 (s) 


= 1 


for all y £ M. Therefore, the choice a s (x) = 
satisfies (75). Inserting a s {x) = into (82), we obtain 


R(D) > —sD + E x 


log- 


1 


0 ™(x) 7 (s)_ 

= —E x [log 9™(x)} -sD - E x [log 7 (s)] 
= t, m (x)-sD-\o gl (s). 


For a continuous random variable x with positive probability 
density function almost everywhere (i.e., M-rectifiable with 
support R m ), the definitions of j(s) in (78) and 7 (s) in 
(80) coincide. Indeed, because d(x,y) = d(x — y,0) and 
a translation of the integrand by y does not change the value 
of the integral over R M , the right-hand side of (78) can be 
written as (recall that J$? M = Jz? M ) 


f e -sd( X ,0) d JfM( x ) = f e -sd(x,y) (83) 

J R M J R M 

for any y G R A/ . Because the left-hand side of (83) does 
not depend on y, taking the supremum over y G R A/ in (83) 
results in 

f e -sd( x ,0) A ^>M( a; ) = sup f e -sd( X ,y) d 

J r m j/gr m 

which is (80). Thus, for a continuous random variable x with 
positive probability density function almost everywhere, the 
Shannon lower bounds (77) and (79) coincide. However, for 
a continuous random variable x whose support £ is a proper 
subset of R a/ we have 7 (s) < 7 (s), and thus the Shannon 
lower bound (79) is tighter (i.e., larger) than (77). This is due 
to the fact that (79) incorporates the additional information 
that the random variable is restricted to £. 


B. Maximizing the Shannon Lower Bound 

The optimal choice of s in (79) depends on D and is hard 
to find in general. At least, the following lemma states that 
the optimal (i.e., largest) lower bound in (79), 

R* SL b(D)= sn P R SLB (D,s) 

s> 0 

is achieved for a finite s. We recall that Dq is the infimum of 
all D > 0 such that R(D) is finite. 

Lemma 56: Let x be an ?n-rectifiable random variable with 
support £ and finite //(-dimensional entropy f) m (x). Then for 
D > Dq the lower bound Rslb(D, s) in (79) satisfies 


lim R slb (D, s) = -oo . 

s —^OO 


Proof: See Appendix K. ■ 

If Rslb(D,s) is a continuous function of s. Lemma 56 
implies that for a fixed D > Dq, the global maximum of 
Rslb(D,s) with respect to s exists and is either a local 
maximum or the boundary point s = 0 , i.e., H (D) = 
Rslb(D,s) for some finite s > 0. Moreover, if 7(s) in (80) 
is differentiable, we can characterize the local maxima of 
Rslb(D,s) as follows. 

Theorem 57: Let x be an ?n-rectifiable random variable with 
support £, and let 7(s) be differentiable. Then for D > Dq, 
the lower bound Rs BB (D, s) in (79) is maximized either for 
s = 0 or for some s > 0 satisfying D (s) = D, where 




7 ; (g) 
7(s) ' 


That is, the largest lower bound is given by 


R^lb( d ) = max \ Rslb(D, 0), sup Rslb(D, s ) 

(84) 

Proof: We recall from (79) that Rslb(D,s) = fi m (x) — 
sD— log 7 (s). Thus, because 7 (s) is differentiable, a necessary 
condition for a local maximum of Rslb(D, s ) with respect to 
s is obtained by setting to zero the derivative of R$ bb (D, s) 
with respect to s. Solving the resulting equation for D yields 
D(s) = D. Thus, for a given D > Dq, R^ b (D,s) can only 
have a local maximum at s G (0, oo) satisfying D(s) = D. By 
Lemma 56, the global maximum either is a local maximum 
or is achieved for s — 0 , which concludes the proof. ■ 

If 7 (s) is differentiable. Theorem 57 provides a “parame- 
trization" of the graph of the largest bound R^ LB (D), i.e., we 
can characterize the set 


G={(D,R* slb (D)) gR 2 :D>Dq}. (85) 

As a basis for this characterization, we define the sets 

J 7 ! = {(D(s),Rslb(D(s),s)) : s > 0 } 

D 2 = {(D, () m (x)-logJ^ m (£)) :D>D 0 } ( 86 ) 

which are illustrated in Fig. 1. Note that T\ is not necessarily 
the graph of a function, whereas constitutes a horizontal 
line in the (D, R) plane. 

Corollary 58: Let x be an ?n-rectifiable random variable 
with support £, and let 7 (s) be differentiable. Define T = 
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Fig. 1. Illustration of the sets T\, and T (assuming Dq = 1). 


T\ U F>. Then Q = T, where T is the upper envelope of T 
given by 


smoothness conditions, the supremum in (80) is in fact a 
maximum, i.e., 

7 (a) = max [ e ~ sd ^' y) dJf m (x) (93) 

yeR M Je 

and D(s) can be rewritten as 

D(s) = D*(s) — —7—r / d(x,y(s))e- sd(x ' y( - s)) djr m (x) 
7(s) Je 

where y(s) is the maximizing value in the definition of 7 (s) 
(cf. (80)): 

y(s) = argmax [ e ~ sd ^ v) dJ^ m (x). 
yeR M Je 

(Thus, 7 (s) = f g e ~ sd ( x ’y( s d dJff m (x).) The following 

corollary shows that even if we do not know whether 7 (s) 
is differentiable, we can construct a set T of lower bounds on 
the RD function. To this end, we define T = JF\ LJ , where 

Ti 4 {(D*(s),R slb (D*(s),s)) : s > 0} 


T = \ (D, R) £ T : R = max R' \ . (87) 

I (d,r')gp J 

Proof: All elements (D, R) £ T can be written as 
( D,R ) = (D, Rslb(D, s)) for some s > 0. Indeed, for 
(D, R) £ this is obvious, and for (D, R) £ Ti we have 

R = h m (x)-log^ m (f) ( = } h m (x)-log 7 (0) ^ Rslb{D, 0) 

( 88 ) 

where (a) holds because 7 ( 0 ) =' J £ Id 
Hence, for all ( D,R ) £ T, we obtain 

R < sup 7 ?slb (D,s)= Rt; LB (D) . (89) 

5>0 

Because JC J, (89) also holds for ( D , R) £ T. 

Consider now the pair ( D , R) £ T for a fixed D > Dq. 
By (87), for a pair (D,R') £ T we obtain R > R!. 
In particular, for s > 0 satisfying D(s ) = D, the pair 
(D, Rslb{D, s)) belongs to T\ C T, and thus 

R > ^slb(79, s) . (90) 

Similarly, (D, h m (x) - log^ m (f)) eJ 2 CJ, and thus 

R > h m (x) - log ( = } RsMD, 0). (91) 


and T' 2 . was defined in ( 86 ). 

Corollary 59: Let x be an m-rectifiable random variable 
with support £. Then J 7 is a set of lower bounds on the RD 
function, i.e., for each ( D , R) £ JF , we have R(D) > R. 

Proof: Let ( D,R ) £ T. 

Case (D,R) £ In this case, we have ( D,R ) = 
(D*(s),Rslb(D*(s),s)) for some s > 0. Thus, R = 
Rslb{D*(8),8) = Rslb(D,s) and, by (79), R < R(D). 

Case ( D , R) £ Ti: In this case, as in ( 88 ), we have R = 
RsL B (D,d). By (79), we have i?. SLB (D,0) < R(D). which 
implies R < R(D). 

In either case R < R(D), which concludes the proof. ■ 

By Corollary 59, we can use the sets T\ and JF 2 to construct 
lower bounds on the RD function . 17 More specifically, these 
bounds are obtained via the following program: 

(PI) Calculate D*(s) for s £ (0,oo). 

(P2) Plot the s-parametrized curve (l?*(s), Rslb(D*{s), s)) 
for s £ ( 0 , 00 ). 

(P3) Plot the horizontal line (U, h m (x) — log Jf m {£)) for 
D £ (D 0 , 00 ). 

(P4) Take the upper envelope of these two curves. 

In the subsequent Section X-C, we will apply the program 
(P1)-(P4) to a specific example. 


Combining (90) for all s > 0 satisfying D(s) = D and (91), 
we obtain 

R > max | 7 ?slb(-D, 0), sup J?slb(-D, s) j ( = Rslb( d ) ■ 
v s>0:D(s)—D J 

(92) 

Combining (89) and (92) for an arbitrary ( D,R ) £ T implies 
that R = i?| LB (D). By (85), this yields ( D,R ) £ Q and 
thus T C G- Because both sets Q and T contain exactly one 
element (D, R) for each D > Dq, we obtain T = Q. ■ 
In certain cases, it may not be possible to differentiate 
7(s), and thus the direct calculation of D(s ) = — 7 , (s)/7(s) 
is not possible. However, one can show that, under certain 


C. Shannon Lower Bound on the Unit Circle 

To demonstrate the practical relevance of Theorem 55, we 
apply it to the simple example given by £ = S \, i.e., the 
unit circle in M 2 , and squared error distortion, i.e., d(x. y) = 
||x — y|| 2 . In order to calculate 7 (s), we first show that it can 
be expressed as in (93), i.e., 

7 (s) = max [ e ~ s W x ~ y W dd / F 1 (x) 

3/6R 2 7 5l 

17 lf Pi = Pi, we obtain by Corollary 58 that these bounds will be the 
best Shannon lower bounds. However, explicit smoothness conditions that 
guarantee T\ = are difficult to find. 
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for all s > 0. Let s > 0 be arbitrary but fixed. Note that we 
can restrict to y = ( y\ 0) T , with 2/1 > 0, because the problem 
is invariant under rotations. Thus, 

f e -°\\*-y \\ 2 = [ e -' s ((* i -3/ i)2 + x l ) dJf’ 1 (x) 

Jsi JSi 

and therefore we have to maximize the function 


fs(yi)= [ 

./Si 

on [0,oo). To this end, we consider the derivative f' s and 
change the order of differentiation and integration (according 
to [27, Cor. 5.9], this is justified because is a finite 

measure and 0 < e~ s ^ Xl ~ Vl ' )2+x ^ < 1 for (x\ X 2 ) T G Si). 
This results in the expression 


f s (yi)= f 2s(x 1 -y 1 )e- s ^-y^ +x ^dJf 1 (x). (94) 
Js i 



1CL 2 10 -1 10° 10 1 10 2 10 3 


s 

Fig. 2. Graph of 7 (s) for s G [0.01,5000]. 


Because X\ < \ for x G Si, we have f s {yi) < 0 for 
2 /i > 1, i.e., f s is monotonically decreasing on (l,oo). 
Thus, the function f s can only attain its maximum in the 
compact interval [0,1], Because f s is a continuous function, 
we conclude that 7 (s) = max y6 g 2 f s e - s ll a; -i'll d Jf 1 {x) 
exists for each s > 0. 

To characterize 7 (s) in more detail, we consider the equa¬ 
tion /'(t/i) = 0 to find local maxima. By (94) and because 
x\ + x\ = 1 for x G Si, fs(yi) = 0 is equivalent to 

2 se~ s(1+v2) f {x 1 -y 1 )e 2sxiyi dM ?1 (x) = 0. (95) 

JS i 

Furthermore, because 2se~ s ^ 1+y2 ^ > 0 and using the trans¬ 
formation X\ = cos cj), X 2 = sin <f>, we obtain that (95) is 
equivalent to 

r2n 

/ (cos </> — t/i) e 2syi cos ^ dcj) = 0 . (96) 

Jo 

Because we know that the function f s can only have zeros on 
[0,1], we can solve (96) numerically for any fixed s > 0 and 
compare the values f s (yi) at the different solutions 2/1 and at 
the boundary points 2/1 = 0 and 2/1 = 1 to find 7 (s). In Fig. 2, 
the values of 7 (s) are depicted for s G [0.01, 5000]. 

We now have all the ingredients to calculate the parametric 
lower bound /i’slb ( s) in (79) for any given distortion D 
and an arbitrary source x on Si. In particular, let us consider 
a uniform distribution of x on Si, where (^(x) = log(2-7r) 
(see (37)). In Fig. 3, we show the lower bound Rslb(D, s) for 
s G [1,94] and distortion D = 10~ 2 . It can be seen that the 
maximal lower bound .Rslb(10~ 2 , s) is obtained for s s=s 50. 

To plot Fig. 3, we had to calculate 7 (s) for many different 
values of s. We also used “trial and error” to find the region 
of s where the maximal lower bound 7?slb(10 -2 , s) arises. To 
avoid this tedious optimization procedure, which would have 
to be carried out for each value of D under consideration, we 
can use the program (P1)-(P4) formulated in Section X-B. In 
Fig. 4, we show the lower bounds on R(D) resulting from 
this program for s G [1,10 5 ], which corresponds to D G [5 • 
10~ 5 ,1], We also show in Fig. 4 an upper bound on R(D) 
using the following result. 



Fig. 3. Shannon lower bound -Rslb(10 2 ■ -s) for .s G [1,94]. 


Theorem 60: Let the random variable x on R 2 be uniformly 
distributed on the unit circle, and consider squared error 
distortion, i.e., d(x,y) = ||a; — y || 2 . For any nGN, 

R(D n )<\ogn (97) 

where 

D n = l- f-sin-) . (98) 

\7T n j 

Proof: See Appendix L. ■ 

The upper bound depicted in Fig. 4 was obtained by linearly 
interpolating the upper bounds (97) corresponding to different 
values of n (and, hence, of D n ). This is justified by the 
convexity of the RD function [13, Lem. 10.4.1]. Note that 
the lower and upper bounds shown in Fig. 4 are quite close, 
and thus they provide a rather accurate characterization of the 
RD function of x. 


XI. Conclusion 

We presented a generalization of entropy to singular ran¬ 
dom variables supported on integer-dimensional subsets of 
Euclidean space. More specifically, we considered random 
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D 

Fig. 4. Shannon lower bound constructed by (P1)-(P4) and upper bound 
(97) for a source x on R 2 uniformly distributed on the unit circle and squared 
error distortion. 


variables distributed according to a rectifiable measure. Sim¬ 
ilar to continuous random variables, these rectifiable random 
variables can be described by a density. However, in contrast 
to continuous random variables, the density is nonzero only 
on a lower-dimensional subset and has to be integrated with 
respect to a Hausdorff measure to calculate probabilities. 
Our entropy definition is based on this Hausdorff density 
but otherwise resembles the usual definition of differential 
entropy. However, this formal similarity has to be interpreted 
with caution because Hausdorff measures and projections of 
rectifiable sets do not always conform to intuition. We thus 
emphasized mathematical rigor and carefully stated all the 
assumptions underlying our results. 

We showed that for the special cases of rectifiable random 
variables given by discrete and continuous random variables, 
our entropy definition reduces to classical entropy and dif¬ 
ferential entropy, respectively. Furthermore, we established a 
connection between our entropy and differential entropy for a 
rectifiable random variable that is obtained from a continuous 
random variable through a one-to-one transformation. For joint 
and conditional entropy, our analysis showed that the geometry 
of the support sets of the random variables plays an important 
role. This role is evidenced by the facts that the chain rule 
may contain a geometric correction term and conditioning may 
increase entropy. 

Random variables that are neither discrete nor continuous 
are not only of theoretical interest. Continuity of a ran¬ 
dom variable cannot be assumed if there are deterministic 
dependencies reducing the intrinsic dimension of the ran¬ 
dom variable, which is especially likely to occur in high¬ 
dimensional problems. As two basic examples, we considered 
a random variable x G ffi 2 supported on the unit circle, 
which is intrinsically only one-dimensional, and the class of 
positive semidefinite rank-one random matrices. In both cases, 
the differential entropy is not defined and, in fact, classical 
information theory does not provide a rigorous definition of 
entropy for these random variables. 

As an application of our entropy definition to source coding. 


we provided a characterization of the minimal codeword length 
for quantizations of integer-dimensional sources. Furthermore, 
we presented a result in rate-distortion theory that generalizes 
the Shannon lower bound for discrete and continuous random 
variables to the larger class of rectifiable random variables. The 
usefulness of this bound was demonstrated by the example of a 
uniform source on the unit circle. The resulting bound appears 
to be the first rigorous lower bound on the rate-distortion 
function for that distribution. 

Possible directions for future work include the extension 
of our entropy definition to distributions mixing different 
dimensions (e.g., discrete-continuous mixtures). The extension 
to noninteger-dimensional singular distributions seems to be 
possible only in terms of upper and lower entropies, which 
could be defined based on the upper and lower Hausdorff 
densities 18 [14, Def. 2.55], Furthermore, our entropy can be 
extended to infinite-length sequences of rectifiable random 
variables, which leads to the definition of an entropy rate 
generalizing the (differential) entropy rate of a sequence of 
discrete or continuous random variables. Finally, applications 
of our entropy to source coding and channel coding problems 
involving integer-dimensional singular random variables are 
largely unexplored. 

Appendix A 
Proof of Lemma 9 

To prove the existence of a support £ C £, we have to 
construct a set £ that satisfies (cf. Definition 8) 

(i) £ = UfcGN fk{Ck) where, for k S N, C k Q ffi™ is a 
bounded Borel set and f k : ffi™ —> ffi M is a Lipschitz 
function that is one-to-one on C&; 

(ii) £ C £■ 

(hi) 

(iv) > 0 ^™ -almost everywhere. 

To prove (i), we note that, by (9), the m-rectifiable set 
£ satisfies £ C £ 0 U UfcgN fk{Ak) with bounded Borel sets 
Ak C R m , Lipschitz functions f k : ffi™ —> ffi M that are one- 
to-one on Ak, and Jf? m (£o) = 0. Because /i <C J^™|g, we 
obtain /i <C where £* = (J keN fk(Ak)- Thus, the 

Radon-Nikodym derivative exists. Note that 

is in fact an equivalence class of measurable functions and 
only defined up to a set of -measure zero. Because 

fi(£ c ) = 0 and g,((£*) c ) = 0, we can choose a function g 
in the equivalence class of satisfying g(x) = 0 on 

(£ n£*) c . Since g is a measurable function, the set ff _1 ({0}) 
is ./^'"'-measurable. Furthermore, because £* is m-rectifiable. 
Property 1 in Lemma 4 implies that the subset g -1 ({0}) fl£* 
is again m-rectifiable. By [28, Lem. 15.5(4)], there exists a 
Borel set Bo satisfying 

<r 1 ({0})n£* (99) 

and j4? m (Bo \ (<7 _1 ({0}) fl£ *)) = 0. The absolute continuity 
/r <C <C then implies 

H{B 0 \ (3 -1 ({0}) n £*)) = 0. (100) 

18 The upper and lower Hausdorff densities exist for arbitrary distributions, 
whereas, by Preiss’ Theorem [16, Th. 5.6], the existence of the Hausdorff 
density implies that the measure is rectifiable. 
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We further have 


KBo) < p(B 0 \ Gr'do}) n £*)) + rtg-Hm n £*)) 
<M^o\(5- 1 ({0})nr))+ / r(3- 1 ({0})) 

( = } 0 ( 101 ) 

where (a) holds by (100) and because At(ff _1 ({0})) = 

J & _ ;L ({ 0 }) g(x) dJf’ m \ £ » (x) = 0. Let us define 

£={J fk(A k \fkHB 0 )) ( 102 ) 

ke N 


where Ak \ f : f 1 (Bo) are bounded Borel sets (this is because 
Ak are bounded Borel sets, fk are continuous functions, and 
Bo is a Borel set). As j)- are Lipschitz functions that are one- 
to-one on Ak, and thus also on Ak \ fif (Bo), this shows 

(i). 

Next, we prove (ii). We have y G fk{Ak \ fk 1 (Bo)) if and 
only if there exists x G Ak \ fjf (Bo) such that fk(x) = y, 
which in turn holds if and only if there exists x' G Ak such 
that fk(x') = y and y f, B 0 . Hence, f k (A k \ f^iB 0 )) = 
fk{Ak) \ Bo- We can thus rewrite £ in (102) as 

£= |J fk(A k )\Bo = £*\Bo C £*\(g-\{0})n£*) (103) 
ke N 

where the final inclusion holds by (99). Because we chose 
g(x) = 0 on ( £D£*) C = £ C U(£*) C , we obtain £ c C p- 1 ({0}). 
Inserting this into (103) yields 

£c£*\(£ c n £*) =Tn(fu (£*) c ) = £* n £ c £ 


which is (ii). 

To prove (iii), we start with an arbitrary .^'"'-measurable 
set A C R m with Jtf m \g(A) = 0. We have 

jr m \ £ *(A\Bo)=Jr m (£*n(A\B 0 )) 

= jr m (£* nAnBH) 

= Jf m ((£*\B n )nA) 

= jf” n (£nA) 

= J4f m \z(A) 

= 0 


where (a) holds because £ = £*\B o by (103). Because y <C 
J% >m \ £ *, this implies y(A \ Bo) = 0 and, since y(Bo) = 0 
by (101), we obtain y(A) = 0. Thus, Jif m \^(A) = 0 implies 
y(A) = 0, which proves (iii). 

To prove (iv), we first show that g is also in the equivalence 
class of the Radon-Nikodym derivative dJ ^m 7 . ■ Indeed, we 


have for an arbitrary measurable set A C 

p(A) = f g{x)dJf’ m \ £ .{x) 

Ja 

= [ g(x)dJT n \ £ *(x)+ [ g(x)dJf m \ £ *(x) 

J An£ J An£ c 

= [ g(x)djr m \ i (x)+ [ _ g(x)dJ(r n \e.(x) 

Ja J ac[£ c 

= f g(x) dJf m \g(x) + y(A D £ c ) 

Ja' 

- [ g(x) dJf ,m \g(x) 

Ja 


where (a) holds because £ C £* (see (103)) and ( b ) holds 
because y(A n£ c ) = 0 (indeed Jf m \g(A D £ c ) = JT m (£ n 
A D £f) = 0 implies y(A IT £ c ) = 0 by (iii)). By (103), we 
have £ C £*, which implies 


J^ m \g((£*) c ) = Jf m (£ n ( £*) c ) < Jf m {£* n ( £*) c ) = 0. 

(104) 

By (99), we have Bq C (g _1 ({0})) c U (£*) c . Hence, for x G 
Bo we have either x G (g _1 ({0})) c —which is equivalent 
to g{x) > 0—or x G (£*) c . By (104), we therefore have 
for Jtf m \g-almost all x G that g{x) > 0. In particular, 
because, by (103), £ = £* \ Bq C £>§, we obtain g(x) > 0 for 
■zf' m |^-almost all x G £. This proves (iv). 

Finally, we show that the support is unique up to sets of 
.^'"'-measure zero. Let £\ and £, be two support sets of 
an ?n-rectifiable measure //, and denote the Radon-Nikodym 
derivative dJ ^m| £ by g 2 - Then 



g 2 (x) dJf m \ £2 (x) = y(£ 2 \£ x ) = 0 


(105) 


where the latter equality holds because y{£}) = 0 (indeed, 
Jrf? m \£ 1 (£i) = 0 implies y(£f) = 0 due to y <C J$? m \£ 1 )- 
Since by Definition 8 g 2 > 0 on £ 2 M >m \ £ ^ -almost every¬ 
where, (105) implies £% >m {£ 2 \ £\) = 0. By an analogous 
argument, we obtain M’ m (£\ \ £ 2 ) = 0. This shows that £\ 
and £ 2 coincide up to a set of .^'"'-measure zero. 


Appendix B 
Proof of Theorem 13 


Proof of Part 1: Let x be 0-rectifiable with support £, i.e., 
yyT 1 < Jrf?°\£ for a 0-rectifiable set £. Recall that a 0- 
rectifiable set £ is by definition countable, i.e., £ = {xi : 
i G 1} for a countable index set I. By (16), Pr{x G £} = 1, 
which implies that x is a discrete random variable. Finally, 


Px{xi) = Pr{x = Xi} 

}) 

d/ix _1 


(12) _]/r 

= fix ({ 05 *}) 


J{* i} dJtf°\ £ 

(a) d^X - 1 

dJ4f°\ £ [ l) 

( ^e°(xi) 


(x) d J^°\ £ (x) 
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where (a) holds because is the counting measure. 

Conversely, let x be a discrete random variable taking on 
the values Xi, i € 1. We set £ = {xi : i £ I}, which 
is countable and, thus, O-rectifiable. Because £ includes all 
possible values of x, we have Pr{x £ £ c j = /ix -1 (£ c ) = 0. 
For A C K m , the measure J£’°\ £ (A) counts the number of 
points in A that also belong to £. Thus, for any set A such 
that J(?°\g(A) = 0, we obtain that A IT £ = 0 and hence 
A C £ c . This implies px~ 1 (A ) < /. ix~ x (£ c ) = 0. Thus, we 
showed that p,x~ 1 {A) = 0 for any set A with J# v 0 \ £ (A) = 0, 
i.e., px~ l < Jff°\ g. Hence, x is O-rectifiable. 

Proof of Part 2: Let x be M-rectibable on R M , i.e., 
fix' 1 < for an M-rectifiable set £. Because Jif M 

is equal to the Lebesgue measure Jzf M [18, Th. 2.10.35], 
we obtain //x “ 1 -C A£ M \ £ <C f£ M . Thus, by the Radon- 
Nikodym theorem, there exists the Radon-Nikodym derivative 
/x = satisfying Pr{x £ A} = f A f x (x) dAf M {x) for 

any measurable A C R M , i.e., x is a continuous random 
variable. By (13), 6 x ! = / x _Sf M -almost everywhere. 

Conversely, let x be a continuous random variable on R M 
with probability density function / x . For a measurable set 
A C M m satisfying Af M (A) = 0, we obtain fix' 1 (.4) = 
Pr{x £ ,4} = f A f x (x) dJf M (x) = 0. Thus, we have 
fix' 1 <C P£ M . Because f£ M = = ^ m | r m, this is 

equivalent to fix' 1 <C J4? m \^m. Furthermore, by Property 6 
in Lemma 4, R M is M-rectifiable. It then follows from 
Definition 11 that x is an M-rectifiable random variable. 


Appendix C 
Proof of Theorem 20 


We first note that the set <t>(£) is ?n-rectifiable because £ is 
?n-rectibable and because of Property 3 in Lemma 4. To prove 
that y is ?n-rectibable, we will show that fty ' 1 <C <A? m \</,(£). 
For a measurable set A C R M , we have 


l-iy 1 (^l) = Pr{0(x) £ A} 

= Pr{x £ </> _1 (^l)} 


(14) 


Icf-AA) 


9?(x)dJf m \ £ (x) 


i<l>-^{A)n£ ^ 4 , 

[ e:\f~Hy)) 

I Anc/>(£) Tf^'Hy)) 

r 0?(<t>~Hv)) 


d 


'a 1 {y)) 


. (106) 


Here, (a) holds because of the generalized area formula [14, 
Th. 2.91], and f -1 : (/)(£) -A £ is well debited because <f> is 
one-to-one on £. For a measurable set A C M m satisfying 
A^’ rn \(j,(£){A) = 0, (106) implies fty ' 1 (.4) = 0, i.e., fty ' 1 <C 
Ai? m \<t>(£)■ Thus, y is an m-rectibable random variable. 

By (106), %( 0 -i[y)) equals the Radon-Nikodym derivative 
|^ (g ( y ), and thus we obtain 




d/iy 1 

d^ m U(f) 


(y) 



(107) 


for Jt?™' 1-almost every y £ R . We conclude that 

f) m (y) = - / e™(y)\o g e™iy)djr m (y) 

JM£) 


(107)_ r 1 {y)) 

J<K £) J|(0 _1 (y)) 

x log 


Tf (</>'%)) 


d J? m (y) 


(A [Wix) (0?(x)\ T £ ( ,,^ m( , 

/,W log lW* ( ’ 1 ’ 

= -J 9™{x)log0™(x)d2r m (x ) 

+ / 0?(x) log J| (a) dAi? m {x) 


(15) 


= r(x)+E x [logJ|(x)] 

where (a) holds because of the generalized area formula [14, 
Th. 2.91], 


Appendix D 
Proof of Theorem 28 

Proof of Properties 1 and 3: We brst show that for any 
fi(x, y )' 1 -measurable set A C h m i+m 2 

Mx,y) _1 M) = Pr{(x,y) S A} 

= [ c i W 2 (y)d^ mi+m 2 | £l x£ 2 (®,y). 

■I A 

(108) 

To this end, we brst consider the rectangles A\ x A 2 with 
A-i C -measurable and A 2 C R M2 A?™ 2 - 

measurable. We have 


Pr{(x,y) £ AixA 2 } 


= p r {x £ Ai} Pr{y £ A 2 } 


(14) 

(&) 

(c) 


0?'(x)dj(r"\ ei (x) / 0 y m2 (y)d^ m2 k(y) 


Ai 


> Ai x A 2 


> A 2 


J Ai x A 2 


C 1 (y) d(j^ mi \ £l x Jf m *\ £ 2 )(x,y) 

or (x)or (■ y) dJ? m ' +m > | £l x£2 (*, y) (109) 


where (a) holds because x and y are independent, ( b) 
holds by Fubini’s theorem, and (c) holds by Lemma 27. 
Because the rectangles generate the p(x, y) _1 -measurable 
sets, (109) implies (108). For a p(x, y) _1 -measurable set 
A C RM 1 +M 2 sat i s fyi n g J^ mi+m2 \ £lx£2 (A) = 0, (108) 
implies p(x,y)~ 1 (A) = 0, i.e., y(x, y) _1 < A(f mi+m2 \ £ix£2 
(note that this is Property 3). Furthermore, since x is m\- 
rectibable and y is 7712 -rectibable, it follows from Lemma 27 
that £1 x £ 2 is (mi + W 2 )-rectibable. Hence, according to 
Debnition 11, (x, y) is an (mi + m 2 )-rectibable random 
variable, thus proving Property 1. 

Proof of Property 2: Again due to (108), 


eror = 


dfdx.y) 1 

dJfmi+m 2 \ £ix£2 


( 13 ) 

r\m\-\-rri 2 
~ ^(x.y) 
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Proof of Property 4: We have 

fl mi+m 2 (x.y) 

( = - / ^vT 2 (*. y) lo § d lT 2 (*. v) 

JR M 1+ M 2 

x dJf 7mi+m2 |f lX f 2 (x, y) 

( = - / C 1 W 2 (w) l0 § (C 1 (*)0 y m2 (!/)) 

xd/ rai+m 1 fix£ 2 (x,y) 

= - [ C 1 W 2 (y) log (C 1 (^ y m2 (y)) 

J1 m 1+ m 2 

X d(^ mi | £l X Jf m 2 \ £ 2 )(x,y) 

= -[ [ e^(x)e™(y)(ioger(x) 

J R m 2 JR m i v 

+ log (y)) d^ | £l (X) dJf ™ 2 Is 2 (y) 

= - f 0™ 1 (x) log 0™ 1 (at) dJf"” 1 1 ^ (a) 

2 R m i 

- f 0™ 2 (y) loge™ 2 (y)dJt? m2 \ £2 (y) 

il“2 

( = l} mi (x) + h™ 2 (y) • 

Here, (a) holds by Lemma 27, (6) holds by Fubini’s theorem, 
and (c) holds because, by (16), 

f C 1 (*) d ^ mi \ £l (X) = [ e™ 2 (y) dJT m2 1 £2 (y) = 1. 

2 R"i JR«2 


Appendix E 
Proof of Theorem 31 


We will use the generalized coarea formula [18, Th. 3.2.22] 
several times in our proofs. Because the classical version of 
the generalized coarea formula only holds for sets of finite 
Hausdorff measure, we first present an adaptation that is suited 
to our setting. 

Theorem 6L Let £ C R m i+m 2 an TO - re ctifiable set. Fur¬ 
thermore, let £ 2 = p y (£) C be TO 2 -rectifiable (m 2 < M 2 , 
m - m 2 < Mi), J4? m2 (£ 2 ) < 00 , and J p f y > 0 J? m \ £ - 
almost everywhere. Finally, assume that g: £ —> R is Jt? m - 
measurable and satisfies either of the following properties: 

(i) g(x, y) > 0 J^ m -almost everywhere; 

(ii) f £ \ff(x,y)\dJ^ m (x,y) < 00 . 

Then for all Jf' m_Tra2 -measurable sets A\ C R Ml and FA m ' 2 - 
measurable sets A 2 C R. M2 , 


'(Ai x.A 2 )n£ 


g(x,y) dJf’ m (x,y) 


L 


gjxpy) 
A2f~\£2 EiClfCw) JL (x,y) 


dJf m - m2 (x)d Jf m2 (y) 

( 110 ) 


where £^ = {x £ R Ml : ( x,y ) £ £}. Furthermore, the 
set A\ FI £ v) is (to — ??7.2)-rectifiable for J^ m2 -almost every 

y £ R M2 . 

Proof: By Property 2 in Lemma 4, M 2m \ £ is cr-finite. 
Thus, we can partition £ as £ = P a irwise 


disjoint sets Ti satisfying M’ m (fFi) < 00 . For A\ C R Ml 
Jf’ 1711 -measurable and A 2 C R M2 Jff” 712 -measurable, we have 


'(A 1 xA 2 )n£ 


g(x,y)dJ^ m (x,y) 


= E 


^ J(AixA2)r\J 7 i 


g(x,y) dJt? m (x,y) 


( a 


= E 


j€N 


xt 2 n£ 2 (AxA)n 


g(x,y') 

Jp s ( x iV') 


dM’ n 


m2 (x,y') 

x d Jf? m2 (y) 

(111) 


where (a) holds by the classical version of the general coarea 
formula [18, Th. 3.2.22] (note that £2 and F, have finite 
Hausdorff measure) and because J]f > 0 |g-almost 
everywhere. By either (i) or (ii), we can apply Fubini’s theorem 
in (111) and change the order of integration and summation. 
We thus obtain 


' (Ai xA2)n£ 


g( x , y) dJ^ m (x,y) 


A 2 ri£ 2 


E 


(Ai xA2)n 


g(x,y') 

Jp y (x,y') 


dJF r 


l (x,y’) 


x dJf? m2 (y) 


= J J dJr m - m 2 {x,y')dJtf m 2 (y) 

A 2 n£ 2 Mix^4 2 )n 
P y _1 ({i/})n£ 


= [ f dJ(? m - m2 (x,y')dJi? m2 (y) 

J _ J y (a:,y) 

A 2 n£ 2 (AixA 2 )r\ 

Py _1 ({y}) n£ 


(b) I" f g(x,y) 

JA 2 n£ 2 JA 1 n£(y'i JL{x,y) 


dJ(f m - rn2 (x)dJ^ m2 (y) 


where (a) holds because y' = y for all (x. y') £ p^" 1 ({y}), 
and (b) holds because the Hausdorff measure does not depend 
on the ambient space [14, Remark 2.48], i.e., integration 
with respect to on the affine subspace p y _1 ({y}) C 

jg)Mi+M 2 an( j on gMi j s identical. Thus, we have shown (110). 

We now prove the second part of Theorem 61. By [18, 
Th. 3.2.22], the sets p y ' 1 ({y})flJ r i are (to—TO 2 )-rectifiable for 
J^ m2 -almost every y £ E A/2 . By Property 5 in Lemma 4, the 
same holds for their union (Jiert Pv" 1 ({2/})n = P7 1 ({y})n 
£. The Lipschitz mapping p x : R^L+m 2 R m i, p x (a:,y) = 
x satisfies 


Px(p y HiyDnf) 

= {x£ R Ml : 3 y' £ R M2 with ( x,y') £ p- x ({y}) n£} 
= {x £ R Ml : (x, y) £ £} 

= £^ . 


Thus, £(«) is obtained via a Lipschitz mapping from the set 
p y ^ 1 ({y}) n £, which is (to —TO 2 )-rectifiable for J^ m2 -almost 
every y £ ]R M2 . Therefore, by Property 3 in Lemma 4, £^> 
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is again (to — m 2 )-rectifiable 19 for M >rn ' 2 -almost every y G 
R A/a . Finally, by Property 1 in Lemma 4, the same is true for 
AiC \£ (j,) . ■ 

We now proceed to the proof of Theorem 31. 

Proof of Property 1: We have for any Jf” 7 ’ -2 -measurable set 
A C R M2 


py H- 4 ) 

= Pr{y G A} 


(14) 

(a) 


Pr{(x, y) G R Ml xA} 

[ 0& y) (*,w) djr m (x,y) 

J(R M ixA)n£ 

Q'Wl' fgg 

{ *f, 1 - djr m " m2 (*) dJT™ 2 (y) 

.(.vi Jp a x iU) 

6Y 1 fx.y) 

( 7 ' (x) ( y) (112) 

X lV) 



where in (a) we used (110) for g(x,y) = 0)^ y ^(x,y) > 0. 
For an Jt ? m2 -measurable set A satisfying j4? m2 \g 2 (A) = 0, 
(112) implies py~ 1 (A) = 0, i.e., /ty^ 1 <C M' rn ‘ 1 \^. Thus, 
according to Definition 11, y is m. 2 -rectifiable. 

Proof of Property 2: Because py~ l <C M >m2 \^, it follows 
from Property 4 in Corollary 12 that there exists a support 
£2 C £ 2 of the random variable y. 

Proof of Property 3: From (112), we see that 



g g, y pO*Vy) 
Jp,(x,y) 


dJ^ m ~ m2 (x) 


dpy 1 

dJf' m2 |^ 2 


(y) = o™ 2 (y) 


and some support £2 Q £ 2 - Let A\ C R Ml and A 2 Q R M2 be 
.yf '" 1 -measurable and J^ 1 ™ 2 -measurable, respectively. Then 


Pr{(x, y) G Ai x A 2 } 


(44) 

I Pr{x G A-i 

|y = 


Ia 2 


(13) 

/ Pr{x G Ai 

|y = 


Ia 2 


(a) 

/ Pr{x G Ai 

|y = 


t A 2 



y}dpy l {y) 
y}e™ 2 (y)djr m2 \£ 2 (y) 
y}e™ 2 {y)djr m2 \ i2 (y) (113) 


where (a) holds because we can choose Of 12 (y) = 0 for y G 
£$. On the other hand, we have 


Pr{(x,y) G Ai xA 2 } 
(14) f 


i(AixA 2 )n£ 


0^ y) (x,y)djr m (x,y) 


(a) 


fl£, y )( x >y) 

/-4 2 n£ 2 JaiC £(«) Jp y (x,y) 

g fr,y )( X >V) 

/ A 2 JA 1 n£(y~> Jp y 


dj4? m - m2 (x)dJ4f m2 (y) 


r r Qpri' (x xj') 


(114) 


where in (a) we used (110) for g(x,y) = 0(™ y )(a;,y) > 
0. Combining (113) and (114), we obtain that for M’ m2 \^- 
almost every y and every -measurable set A\ C R Ml 


where the latter equation holds because py~ 1 <C 
This implies (39). 

Proof of Property 4: Using (39) in (21) and proceeding 
similarly to (112), we obtain 


Pr{x G *4i | y = y}d™ 2 (y) 


/A 1 n£(y') 


g g,y fay) 

Jp y ( x ,y) 


dJf"n-m 2 (x) . 


(115) 


b m2 (y) 



x log 



g fr, y fay) 


dJf™-™ 2 




d (ly )( X ^y) 

Jp y ( x ,y) 


djr m-m 2 (-) \ dJ ^m 2 ^ 


Because (115) holds for ^ m2 |g 2 -almost every y and £ 2 C £ 2 , 
(115) also holds for M’ m ‘ 2 \g 2 -almost every y. Furthermore, 
because £2 is a support of y, we have 0f l2 {y) > 0 <A? m2 \£ 2 - 
almost everywhere. Thus, we obtain for M 1 '" 2 t c 2 -almost every 
y and every -measurable set A\ C R Ml 


= -jf^y) (*>!/) 

O r 0 ^ (x xj^) \ 

I (*,y) » ' djr m - m2 (x))dJf m (x,y). 

£(y) Jp y (x,y) ) 

Thus, (40) holds. 


Pr{x G Ai | y = y} 


r d (*,y fay) 

Ainw J pfa,y) e T(y) 


dj? 11 


! { x ) 



g £y fay) 

Jpfa,y) K 2 (y) 


djf m - m2 \ £(y) (x). 


(116) 


Appendix F 
Proof of Theorem 35 

Proof of Property 1: By Theorem 31, the random variable y 
is TO 2 -rectifiable with Hausdorff density Of 12 (given by (39)) 

19 Note that 8^ is -measurable because 1 ({j/}) PI £ is 

m 2 - measura bl e and the Hausdorff measure does not depend on the 
ambient space [14, Remark 2.48]. 


Therefore, Pr{x G • | y = y} < J4P m - m2 \^ v) . By Theo¬ 
rem 61, the set £( y ' > is (to — m 2 )-rectifiable for ^ OTra2 -almost 
every y. Hence, according to Definition 6, Pr{x G • | y = y} 
is (to — TO 2 )-rectifiable for -almost every y. 

Proof of Property 2: By (116), we have { x ) = 

je 9 ^y) X e f 2 {y) for 2 -almost every y. Thus, (10) im¬ 

plies (45). 
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Appendix G 
Proof of Theorem 39 


Starting from (47), we have 

^ro-m 2 ( x | y) 


= - [ o? 2 (y) [ o 

Je 2 j£(v) 


Pr{xe- I y—y} 


O) 


X log 0“ 2 , y=y} (x) d^“ 2 (X) dJ?™ 2 (: y) 


( =- I 0^(y) [ 

j£n J£i 


%,y )(*>!/) 


/£(»> Jp y {x,y)0™ 2 (y) 

/ 0171 fgg \ 

[ f e T*,y p(^y) 

Js 2 Js(v) Jp y {x,y) 

( Q'm (x 'If') \ 

=- L v) log ( ii0wh) ) d ^” ( ‘ E ' B) 


( =-E 


(x,y) 


log 


0p(y) 


+ E (x , y) [log J p £ y (x,y) 


where in (a) we used (110) with _4i = 

e u,y)( x >y) 


9 {x,y) = 0^ y) (x,y) log ( 


, ^2 = 


, and 


JL(x,y)8 y 2 (y) 


)• (Here, g(a:,y) is 


J^ m |g-integrable by our assumption in Theorem 39 that the 
right-hand side of (48) exists and is finite, i.e.. Condition (ii) 
in Theorem 61 is satisfied.) Thus, (48) holds. 


Appendix H 
Proof of Theorem 46 

We first note that the product measure /tx" 1 x //y _1 can be 
interpreted as the joint measure induced by the independent 
random variables x and y, where x has the same distribution 
as x and y has the same distribution as y. Because x is m\- 
rectifiable and y is m 2 -rectifiable, the same holds for x and 
y, respectively. Furthermore, the Hausdorff densities satisfy 
Of 11 (x) = 0™ 1 (x) and 9™ 2 (y) = 0™ 2 (y). By Properties 1-3 
in Theorem 28, the joint random variable (x, y) is (mi +777.2)- 
rectifiable with (mi + m 2 )-dimensional Hausdorff density 

d^+ m2 (x, y) = 9 .™ (x)6™ 2 ( y) = 9^ (*K 2 (y) (117) 

and /x(x,y) _1 <C ^f mi+m2 \£ lX £ 2 - The rectifiability of (x, y) 
with /x(x,y) _1 <C J4? mi+m2 \ £l x£ 2 implies that the measure 
/tx -1 x /xy -1 is (mi + m 2 )-rectifiable and 

/XX" 1 x yy- 1 « Jf mi+rn2 \ eiX £ 2 . (118) 

Proof of Part 1 (case m = m\ + m 2 ): For any Jf’ m - 


measurable set A C M m i +M 2 , we have 
y(x,y)-\A) 

( = [ @(x,y ) (x, y) dA? m \ £ (x, y) 

J A 

= [ e^ y) (x,y)dJ? m \£ lX £ 2 (x,y) 

J A 

-9r(x)e^(y)dJf’ m \£ lx£2 (x,y) 


( b ) 


f ^,y)(®.W) 

J A 9T 1 (x)er(yY 


( 117 ) f 0(Z,y)( X, y) am t \ 

= ACWW^ ,(S,?,) \e^s 2 {x,y) 


f 

I A 0?Hx)9™ 2 (y) 


d (y,x 1 x /ty 1 ) (*, y). (119) 


Here, (a) holds because £ C £1 x £2 and because we 
can choose 9^ y ^(x,y) = 0 on £ c , (6) holds because 
6PJ 11 (x)Q™ 2 (y) > 0 J^ m -almost everywhere on £1 x £ 2 , and 
(c) holds because, by (13), | gf xg2 j ^ m |cx£ 2 - 

almost everywhere. By (119), we obtain that /x(x. y) _1 <C 
/xx -1 x /xy _1 with Radon-Nikodym derivative 


d/x(x, y) 


-1 


-(x,y) = 


( x »y) 


(*,y) 


( 120 ) 


d(/xx _1 x /xy _1 ) rwrw 

Inserting (120) into (60) yields 

/* / (x 'fj^) \ 

/(x;,) = W log Urw ) cWx ' yrV! ' ) 

= //++•«> log 


( 121 ) 


which is (62). Furthermore, we can rewrite (121) as 

^, y) (x,y) 


7(x; y) *=' E (x , y) 


log 


0P(x)0r(y), 

= E (x,y) [log 0(£ y) (X, y)] - E (x , y ) [log 6™ 1 (x)] 

E (x , y) [log @p 2 (y)] 


( 31 ) 


= -r(x,y) -E x [log0 x mi (x)] -E y [log0 y m2 (y)] 


( 19 ) 


= —f} m (x,y) + fi mi (x) + (} m2 (y) 


( 122 ) 


which is (63). Finally, we obtain the first expression in (64) 
by inserting (56) into (122). The second expression in (64) is 
obtained by symmetry. 

Proof of Part 2 (case m < m 1 + m 2 ): We first show that 
/x(x,y) _1 <jt /.tx -1 x /xy -1 . To this end, we show that the 
assumption /x(x,y) _1 <C /xx -1 x /xy _1 leads to a contradic¬ 
tion. Using (118), we have /x(x,y) -1 <C /xx -1 x /xy^ 1 <C 
j^ mi + m2 J SlxSa . By Property 4 in Lemma 4 and because 
£ is an m-rectifiable set and mi + m 2 > m, we obtain 
/ m 1 + m 2 (£) = 0. This implies JT mi+m2 | flX f 2 (£) = 0. 
On the other hand, by (16), /x(x,y) _1 (£) = 1. Thus, we 
have a contradiction to /x(x,y) _1 <C =xf' I7ll+m2 |£ 1 xf 2 - Hence, 
/x(x, y) _1 /xx- 1 x /xy” 1 and, by (61), /(x; y) = 00 . 
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Appendix I 
Proof of Lemma 52 


Let denote the set of all finite, measurable partitions 
of R M , i.e., for ;Q = {Ai ,..., An} £ *P the sets A t are 
mutually disjoint, measurable, and satisfy IJi=i A = R M . 
Using the interpretation of \f n (x) as a generalized entropy with 
respect to the Hausdorff measure | s (cf. Remark 19), we 

obtain by [12, eq. (1.8)] 

r(x)= “ s% £, log ■ <123) 


Because px 1 (£ c ) = 0 and \s(£ c ) = 0, we have for all 


|>A-4)io g (|^L|) 

= 51 px^Aln^iog | 

Ago 


fix 1 (Ar\£) 

Jf m \s(An£) 


= E 1 <X H- 4 ') 10 ® 

A'GO 


( »x-HA') \ 
\J4? m \ £ (A')J 


(124) 


where O. = {A D £ : A € Q} £ 'Pm.oo■ Hence, for every 
Q £ *P there exists a Q £ 'PmE such that (124) holds. Thus, 
the supremum in (123) does not change if we replace 'P by 
(Pm.oo, i.e., we obtain further 


() m (x) = - sup E ^ x 1 (- 4 ) 1 °g 


OGTm^oo AGO 


Og<p 


= __ _ i nf ) ( - E^ x 1 (- 4 ) 1 °g 


AGO 


px 1 (A) 
Jf m \s(A) 

px _1 (A 

jr m \s(A) 


(125) 


= _ inf ^ ( - E 1 <X 1 (- 4 ) 1 °g/ xx HA 


OGtPm.oo V 

+ ^ px _1 (.4) log M >m \£{A) 


AGO 


= inf U[x]q) + E^ x 1 (A) log J? m \ £ (A)) . 

OG*Pm,oo V _4 e fl ) 


(126) 


Here, (125) is (68) and (126) is (69). 

Appendix J 
Proof of Theorem 53 


(69), we obtain 

b m (x) = inf (fT(Mo') 

O'gtIL'oo V x 

+ 53 tar 1 (A)]ogjr n \e(A)\ 

AeO' ' 

N 

< H([x\ a ) + E^-^A) log JT™| £ (A) 

i= 1 

(a) ^ 

< #(Mn) +53/ix- 1 (A ! )log ( 5 

2=1 

= fJ([x] Q ) + log<) 

where (a) holds because jtf ,m \ £ ( y Ai) < S and (6) holds 
because XEi/ iX_1 (A;) = px _1 (£) = 1. Multiplying by 
Id e, we have equivalently 

(f) m (x) - log S) Id e < H([x] a ) Id e. (127) 

By (67), we have 

ff([x]o)lde<L*([x] Q ). (128) 

Combining (127) and (128), we obtain 

(h m (x) — log5) lde < L*([x]o) 
which implies (70). 


J.2 Proof of Upper Bound (71) 

We first state a preliminary result. 

Lemma 62: Let x be an ?n-rectifiable random variable, 
i.e., px -1 < for an m-rectifiable set £ C R M , 

with m £ {1 and J^ m (£) < oo. Furthermore, 

let Q = {Ai,...,A n } £ Vm! oo, where each A: is con¬ 
structed as the union of disjoint sets A,i>..., i.e., 

Ai = UE A ,j with Ai, n n Aij 2 = 0 for j 1 ^ j 2 . Finally, 
let Q = {A,i>..., A.fci, • ■ • , Av,i, ■ ■ ■ ,AN,k N }- Then 

)log(^jl^)). (129) 


Proof: The inequality (129) can be written as 


N 

-5>^(A) lo g 


f /ix HA) \ 
U m |f(A)j 


N 


>-EE^ x 1 (A:,j) log 


*=i i=i 


/ix 1 (A i ,j) \ 
^ m \e(A,j)J 


J.l Proof of Lower Bound (70) 

Let O £ *P^A be an (to, 5)-partition of £ according to 
Definition 51, i.e., O = {A, ■ • ■, Av} where UE A = 

A LI A = 0> and ■^ om (A) < for all i,j £ {1,... ,N}, 
i ^ j. Note that £) also belongs to (plmoc- Then, starting from 


Therefore, it suffices to show that 


px 1 (Ai) log 


( px \Ai) \ 

\jr m \s(Ai)J 


ki 

< El tx_ 1 (A,i) lo g 

3 = 1 


/ px 1 (Ai, j ) \ 
\jr m \ e (A hj )J 
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for i £ {1,..., N}. This latter inequality follows from the log 
sum inequality [13, Th. 2.7.1], ■ 

We now proceed to the proof of (71). By ( 68 ), for each 
e' > 0 , there exists a partition Q = {Ai ,... ,An} £ 'PmE 
such that 

rM>-E f .x- , W log(-p^L)- £ '. (130) 

Let us choose, in particular, s' = 2 /, f . We define 

5 E = (l-e~ e ') min J^ m \ s (A t ) > 0. (131) 

^”|£(A)#0 

Choosing some 5 £ (0, <5 £ ), we furthermore define 

r a dT m | s(Ai) 

S 

if Jf m \ s (A,)^0 
if jf m \ s (A,) = 0. 


and 


Mi, s = 




Let us partition each set A t £ Q into M t j disjoint subsets 
Aij of equal Hausdorff measure , 20 i.e., 


and 


Jf m ls(Aij) 


Jf m ls(A,) 


M ijS 

^i,j = ^' ■ 

3 =1 


For A-, £ 0 such that J^f’ rn \s(Ai) = 0, we have M,; i( 5 = 1, 
and thus this partition degenerates to A t .i = Ai , which implies 
^ m k(A,i) = 0. For Ai £ 0 such that jT m |g(A) ± 0, we 
have Mi t s = and thus 

o^orni 1 a ^ Ji,<5 j ^ f 

^ |s(Aj) = —ft n— = 77 T 0 < 0 . (132) 

I J i,6 I I J*,<5 | 

In either case we have (Aij) < S. 

Let us denote by £L,~ the partition of £ containing all 
the sets Aij. Then M’ m \e(Aij) < S implies Qg £ 
Furthermore, for Aij £ Qs satisfying Jf' m \g(Aij) 7 ^ 0, 

I J*,<5 I 

[Jt.il (TJijil Ji,i) ^ 

rEi 

ri,<5 1 JiJ \ j, 

fJi.il ) 


where (a) holds because \Ji,g\ — Jij < 1. Furthermore, we 
can bound as (note that M’ rn \s{Aij) 7 ^ 0 implies 



“ (l Because 3% >rn is a nonatomic measure, we can always find subsets of 
arbitrary but smaller measure (see [29, Sec. 2.5]). 


jr m \e(Ai) ^ 0) 
lJ'i , 6 1 ^ Ji,5 — 
> 

(131) 


^ m \s(Ai) 

S 

JT m ls(Ai) 


Jf m \ s(At) 


(1 — e e ') min ^ m \e{Ai') 
i'e{i,...,jv} 

Jf m \s(A i ,)^0 


> 


1 


, _ e , - (134) 

1 — e e 

Inserting (134) into (133), we obtain for all sets Aij £ Qs 
satisfying M >m \s(Aij) 7 ^ 0 


jr m \s(Aij) >\l- 

Combining our results yields 


6 = e~ e 6. 


(135) 


l-e- 


(130) , 

to > -E/«~V)i°g 


-4eO 


(129) 


> - E H- 4 ) 10 ® 

AeQs 


(a 


> - E /jx 1 (- 4 ) 1 °s 

AeQs 

(135) v 

> - E //x Vi) lo g 

.460,5 

Jff m \e(A)^0 

- - E MX” 1 (-4) log 

-460s 

= - E M x ~ 1 (_4.) log (^x~ 1 (_4)) + log S — 2e' 

-AeQ,5 

= ^([x]q 4 ) + log 5 — 2e' 

W L*{ [x] fl J-l 


/XX 2 (^l) 

/xx" 1 (.4) 
/tx" 1 (.4) 

^ETE 

/tx" 1 (.4) 
e~ e ' S 

/ix _1 („4) 
e~ e ' S 


— £ 


— £ 


— £ 


— £ 


— £ 


> 


lde 


+ log 6 — 2d 


(136) 


where (a) and (b) hold because, by /tx” 1 <C Jf£ m \s, 
J^’ m \s(A) = 0 implies /xx" 1 (^l) = 0 and thus the additional 
restriction J^ m \s(A) 7 ^ 0 removes only summands that are 
zero, (c) holds because Oi is a partition of £ and thus 
^■ 460,5 / iX " 1 (^l) = /xx" 1 (f) = 1, and (d) holds by the 
second inequality in (67). Finally, rewriting (136) gives (recall 


£ = 


2 lde 


) 


^*([x]o a ) < h m (x)lde — log S Ide + 1 + £ 

which is (71). 


Appendix K 
Proof of Lemma 56 

Because R SLB (D,s) = t) m (x) — (sD + log 7 (s)), where 
fi m (x) is finite and does not depend on s, it is sufficient to 
show lim s _ > , 00 (sD + log 7 (s)) = 00 . For y £ R M , we define 
the set of all x whose distortion relative to y is less than D/2, 

C(y) = I® G R m : d(x,y) < . 
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We obtain 

sD + log 7 (s) 
( 80 ) 


= sL» + log( sup [ e~ sd{,B ' v) djr m (x)) 
\y£R M J£ ) 

sup (sD + logf [ e~ sd(x - v) djr m (x))) 
i€R m V V Je )) 


j/ei" 

sup log (e sD [ e~ sd ( x ’ v) 
v e r m V Je 

sup log ( f e s{D - d ^ x ' y)) dJT m (a 

2/<er m V Je 

e s (D—d(x,y)) dt ^ 7 m (gj) 


(b) 

> sup log 

yeWL M 


Isnc(y) 


> sup log 
ye R M 


I e sD/2 dJ^ m {a 
snc(y) 

= sup log (e sD/2 JJf m (SnC{y))) 


y GR m 

D 


sup log J4? m (£ CiC(y)) 


(137) 


y eR M 


where (a) holds because log is a monotonically increas¬ 
ing function, ( 6 ) holds because e s( - D ~ d ^ x,y ^ > 0 , and (c) 
holds because d(x,y) < D/2 for all x £ C(y). Because 
/rx _1 (£) = 1 (see (16)), the absolute continuity /.tx -1 <C 
M >m \e implies J$? m (£) > 0. Thus, there exists a y £ R M 
such that 6 = J4? m (£ (T C(y)) > 0. Clearly, this implies 
sup yeR M log M >m (£ nC(y)) > log 5, and hence, by (137), 

sD + log 7 (s) > s 7 ^ + log <5. (138) 

For fixed but arbitrary K > 0 and all s > 2 ( K ~^° e ^ , we have 
Sj + log S > K, and thus (138) implies 

sD + log 7 (s) > K . 

Since K can be chosen arbitrarily large, this shows that 

lim.s_j.oo (sD + log 7 (s)) = oo. 


the special case k = 1. The rate of these (l,n) codes reduces 
to Rf g = log?r, and the expected distortion is given by 

77 /i9 =E x [||x-< 7 (/(x))|| 2 ]. (139) 

Thus, the implication of the source coding theorem is that for 
a (l,n) code with expected distortion Df g , we have 

log n>R(D ftg ). (140) 

We directly design the composed function q = g o f. 
Because x has probability zero outside <Si, we only have to 
define q on the unit circle. Furthermore, because / maps x 
to one of at most n distinct values, q = g o f can also 
attain at most n distinct values. We define q to map each 
circle segment defined by an angle interval [i^ZL, (i + 1 ) 77 ), 
i £ { 0 , ...,n — 1 }, onto one associated “center” point, 
which is not constrained to lie on the unit circle. To this 
end, we only have to consider the circle segment defined by 
{x = (cos J> sin 0 ) T : <j> £ [— 7 r/n, 7 r/n)} since the problem 
is invariant under rotations. Because of symmetry, we choose 
the “center” associated with this segment to be some point 
{xi 0) T , i.e., q(x) = (27 0) T for all x = (cos <f> sin (f>) T with 
4> & [— n/n, tt/h). According to (139), the expected distortion 
is then obtained as 


D q = E x [||x — g(x)|| 2 




n 
27r 


n 
27r 



- xi) 2 + sin 2 4>) d (j> 

— 2x\ cos (£) d <f> 


l+x\ 


2nx\ 7r 

-sm — . 

7 r n 


(141) 


Appendix L 
Proof of Theorem 60 

Consider the source x on R 2 as specified in Theorem 60. 
The main idea of the proof is to construct a specific source 
code and calculate its rate and expected distortion. We can then 
use the source coding theorem [25, Th. 11.4.1] to conclude that 
the calculated rate is an upper bound on the RD function. 

To this end, recall that a (k, n) source code for a sequence 
Xi : fc £ (R 2 ) k of k independent realizations of x consists of an 
encoding function /: (R 2 ) fc —> {l,...,n} and a decoding 
function g: {l,...,n} —> (R 2 ) fe . The rate of this code is 
defined as Rf, g = (log n)/k and the expected distortion is 
given by 

Df,g = E Xl:fe [||xi;fe — fl'(/(xi ;fc ))|| ] . 


Minimizing the expected distortion with respect to X\ gives 
the optimum value of x\ as 


* n . 7i 
x 1 = — sm — 
7 r n 


(142) 


The corresponding quantization function will be denoted by 
q*. Inserting (142) into (141) yields D n in (98): 

D q * = E x [||x — g*(x)|| 2 ] 


= 1 + 

= 1 - 
= Dn ■ 


n 


— sin — —21— sin — 


n 7 r 

— sin — 
7r n 


By the source coding theorem [25, Th. 11.4.1], every (k,n) 
code with expected distortion Df g must have a rate greater 
than or equal to R(Df g ). In particular, this has to hold for 


Thus, we found a (l,n) code with expected distortion l) q * = 
D n . Hence, by (140), we have log?r > R{D q ») = R(D n ), 
which is (97). 
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